UTILITY 
PATENT APPLICATION 
TRANSMITTAL 



Attorney Docket No. 5817-7L 



First Inventor or Application Identifier: Harrington, et al. 



Title of Invention: COMPOSITIONS AND METHODS FOR 
NON-TARGETED ACTIVATION OF ENDOGENOUS GENES 



Express Mail Label No. EL247263380US 



ADDRESS TO: ASSISTANT COMMISSIONER FOR PATENTS 
BOX PATENT APPLICATION 
WASHINGTON, DC 20231 

Transmitted herewith for filing in the United States Patent Office is a patent application for: 

Inventors: John J. Harrington, Bruce Sherf, and Stephen Rundlett 

1 ■ lEi The Filing Fee has been calculated as shown below: 

(Submit an original, and a duplicate for fee processing) 







No. Filed 


No. Extra 


Small Entity 
Rate Fee 1 


Large Entity 
Rate Fee 0 




BASIC FEE 


$345 


$0 




TOTAL CLAIMS: 


4-20 = 


0 


X9= $ 0 


x 18 = $0 




INDEP CLAIMS: 


2-3 = 


0 


X39 = $ 0 


x 78 = $0 




OMULTIPLE DEPENDENT CLAIMS 
PRESENTED 


+130 = $ 


+260 = $ 




*If the difference in Column 1 is less than zero, 
enter "0" in Column 2. 


TOTAL $ 345 


TOTAL $ 



The Commissioner is hereby authorized to credit overpayments or charge the following fees to Deposit Account 
No. 16-0605. 

a. [X] Fees required under 37 CFR 1. 16 (National filing fees), 
b- M Fees required under 37 CFR 1. 17 (National application processing fees). 
^2 A check in the amount of $ 345 is enclosed. 



□ 



The above filing fee will be paid along with Applicant(s) Response to the Notice to File Missing 
Parts. 



2. [>g Specification; Total Pages 166 

3. IE 62 Sheets of Formal Drawing(s) (35 USC 113) 

4. M Declaration and Power of Attorney; [Total Pages 2J 

a. □ Newly executed (original or copy) 

b. M Copy from a prior application (37 CER 1.63 (d)) 

(for continuation/divisional with Box 16 completed) 
i. □ DELETION OF INVENTORY) Signed statement 

attached deleting inventor(s) named in the prior 
application, see 37 CFR 1.63(d)(2) & 1.33(b). 

5. D Microfiche Computer Program (Appendix) 

6. IE1 Nucleotide and/or Amino Acid Sequence Submission (if applicable, all necessary) 

a. □ Computer Readable Copy 

b. K| Paper Copy (identical to computer copy) 

c. Kl Statement verifying identity of above paper copy with 

computer readable copy in prior application 

ACCOMPANYING APPLICATION PARTS 

Assignment Papers (cover sheet & document(s) (including $40.00 fee) 
37 CFR 3.73(b) Statement (when there is an assignee); [X] Power of Attorney 
English Translation Document (if applicable) 
Information Disclosure Statement (IDS)/PTO-1449 
Preliminary Amendment 

Return Receipt Postcard (MPEP 503) (Should be specifically itemized) 

Small Entity Statement(s) 

13 Statement as filed in prior application; status still proper and desired. 

Certified Copy of Priority Document(s) (if foreign priority is claimed) 
Foreign Priority is 

Other: 

16. If a CONTINUING APPLICATION, check appropriate box and supply the requisite 
information below and in a preliminary amendment: 

□ Continuation [X] Divisional □ Continuation in Part (CIP) 

of prior Application No: 09/276.820 ; Filed March 26, 1999, which is a CIP of 

09/263.814 filed 3/8/99. which is a CIP of 09/253.022 filed 2/19/99, which is a CfP of 09/159,643 filed 9/24/98. 

which is a CIP of 08,941,223 filed 09/26/97 



7. 


□ 


8. 

9. 


□ 


10. 




11. 




12. 


m 


13. 




14. 


□ 


15. 


□ 



(Utility Patent Application Transmittal) Page 2 of 3 



Prior Application Information: Examiner Ram Shukla Group/Art Unit: 1632 

For CONTINUATION or DIVISONAL APPS only: The entire disclosure of the prior application, from which an oath or declaration is 
supplied under Box 4b, is considered a part of the disclosure of the accompanying continuation or divisional application and is hereby 
incorporated by reference, The incorporation can only be relied upon when a portion has been inadvertently omitted from the submitted 
application parts. 



17 CORRESPONDENCE ADDRESS 

Customer Number or Bar Code Label 000826 

Attention Of: Anne Brown 



Signature: 

Attorney of Record: Anne Brown' 
Attorney Registration No. 36,463 
Tel Raleigh Office (919) 420-2200 
Fax Raleigh Office (919) 420-2260 



ALSTON & BIRD LLP 

P.O. Drawer 34009 
Charlotte NC 28234-4009 



"Express Mail" mailing label number EL247263380US 
Date of Deposit January 18, 2000 



I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express 
Mail Post Office to Addressee" service under 37 CFR 1 . 10 on the date indicated above and is addressed to Box 
Patent Application, Assistant Commissioner For Patents, Washington, DC 2023 1. 

Nora C. Martinez 7] 

RTA01/2071689vl 



(Utility Patent Application Transmittal) Page 3 of 3 



SENT BY:S K Q & F t 6-30-99 ; 6:00PM ; SK.Q&F^ 216231Q9Q5;# 7 



Statement Claiming Small Entity Status 
(37 CJF.R. §§ 1.9(d) and 1.27(c)) » Small Business Concern 

Applicant or Patentee: John J, HARRINGTON. Bruce SHBRF and Stephen RUNDLETT 

$tt "i**"? S?" ?? 7fi ;g° Attorney DoeketNo. imMOXOMMAf^m ' 
Filed or Issued: March 26. 1999 



For. Compositions and M rthodg for Non-targeted Activation of Endogenous Ganes 



I hereby state that I am 

[ ] the owner ofthe small business coneern identified below: 

[X ] an official of the small business concern empowered to act on behalf of the concern identified below: 

NAME OF SMALL BUSINESS CONCERN Athcravs. Inc. 

ADDRESS OF SMALL BUSINESS CONCERN 11000 Cedar Avenue. Cleveland Ohio 44106 



I hereby state that the above identified small business concern qualifies as asmall business concern as defined in 13 C.F.R. § 121.3-18 and 
reproduced in 37 C.F.R. g 1.9 (d), for purposes of paying reduced fees under section 41(a) and (b) of Title 35, United States Code, in that 
the number of employees of the concern, including those of its affiliates, does not exceed 500 persons. For purposes of this Statement, (1) 
the number of employees of the business concern is the average over the previous fiscal year of the concern of the persons employed on a full- 
time, part-time or temporary basis during each of the pay periods of the fiscal year, and (2) concerns are affiliates of each other when either 
directly or indirectly, one concern controls or has the power to control the other, or a third party or parties controls or has the power to control 
both. 

I hereby state that rights under contract or law have been conveyed to and remain with the small business concern Identified above with regard 
to the invention described in: 



[ ] the specification filed herewith with title as listed above, 
[X ] the application identified above. 
[ 3 the patent identified above. 



If the rights hold by the above identified small business concern are not exclusive, each individual, concern or organization having rights in 
the invention must file separate statements indicating their status as small entities, and no rights to the invention are held by any person, other 
than the inventor, who would not quality as an independent inventor under 37 C.F.R. § 1.9(c) if that person made the invention or by any 
"""""■a which wouId not qualify as a small business concern under 37 C.F.R. § 1.9(d) or a nonprofit organization under 37 C.F.R. § 1.9(b). 



Each person, concern or organization having any rights in the invention (otherthan the small business concern named above) is listed 

[X ] no such person, concern, or organization exists. 
[ ] each person, concern, or organization is listed below. 



( ) INDIVIDUAL (X) SMALL BUSINESS CONCERN ( ) NONPROFIT ORGANIZATION 

Separate statements are required from each named person, concern or organization having rights to the invention averring to their status as 
small entities. (37 CJPiL§ 1.27) 

I acknowledge the duty to file, in this application of patent, notification of any change in status resulting in loss of entitlement to small entity 
status prior to, paying, or at the time of paying, the earliest of the issue fee or any maintenance fee due after the date on which status as a small 
entity is no longer appropriate. (37 C.F.R. § 1.28(b)) 



NAME OF PERSON SIGNING ^^^^ ' f^V<-^L^\ ^Mm-FJ^^T l^-Qj^^— 
TITLE IN ORGANIZATION 

ADDRESS OF PERSON5IGNING U O QJjtift'fc. <~^rSl/£ tVT^P O Q ^WOC o 

DATE 



SIGNATURE _ 



Attorney's Docket No. 5817-7L 

PATENT 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re: Harrington, et al. Group Art Unit: Not Yet Assigned 

Appl. No.: Not Yet Assigned Examiner: Not Yet Assigned 

Filed: Concurrently Herewith 

For: COMPOSITIONS AND METHODS FOR NON-TARGETED ACTIVATION 

OF ENDOGENOUS GENES 

January 18, 2000 

Assistant Commissioner for Patents 
Washington, DC 20231 

PRELIMINARY AMENDMENT 

Dear Sir: 

Please amend the above-identified application as follows: 
In The Specification : 

Below the heading "Cross-Reference to Related Applications" and after the words "This 
application", insert the following -is a divisional application of U.S. Application No. 09/276,820, 
filed March 26, 1999, entitled "COMPOSITIONS AND METHODS FOR NON-TARGETED 
ACTIVATION OF ENDOGENOUS GENES" which-; and on line 2, in the blank, please insert 
-09/263,814-. 

In The Claims : 



Please cancel claims 1-57, without prejudice to or disclaimer of the subject matter 
contained therein. 
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Please add the following new claims: 

58. (New) A method for drug discovery comprising: 

(a) integrating a vector into the genome of a eukaryotic host cell, wherein said 
vector integration activates expression of an endogenous gene in said host 
cell; 

(b) culturing said cell under conditions favoring expression of said activated 
gene, thereby producing a gene product of said activated gene; 

(c) treating said cell with one or more test compounds to be screened for drug 
activity; and 

(d) determining the ability of said one or more test compounds to interact with, 
or affect a cellular phenotype induced by, said gene product. 

59. (New) A method for drug discovery comprising: 

(a) integrating a vector into the genome of a eukaryotic host cell, wherein said 
vector integration activates expression of an endogenous gene in said host 
cell; 

(b) culturing said cell in reduced serum medium under conditions favoring 
production of a gene product of said activated gene, thereby producing 
cell-conditioned media comprising said gene product; and 

(c) screening one or more test compounds for drug activity by determining the 
ability of said test compounds to interact with said gene product in said 
cell-conditioned media. 



60. (New) The method of claim 59, further comprising concentrating said cell- 
conditioned media prior to said screening in (c). 
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61 . (New) The method of claim 59, further comprising isolating said gene product 
prior to said screening in (c). 

REMARKS 

No new matter has been added by the foregoing amendment to the specification, which 
has been made solely to provide the application number for the priority application filed on March 
26, 1999, which number was not available at the time of filing of the present application. 

The foregoing amendments to the claims are fully supported in the specification as 
originally filed. Specifically, support for new claims 58-61 may be found in the specification at 
pages 6-17, at pages 38-44, at pages 50-53, at pages 57-61, at pages 68-1 18, and throughout the 
Examples. Accordingly, the foregoing amendments to the claims do not add new matter; their 
entry is therefore respectfully requested. Upon entry of the foregoing amendments, claims 58-61 
are pending in the present application. 

Applicants believe that the present application is now in condition for examination. If the 
Examiner believes, for any reason, that personal communication will expedite prosecution of this 
application, the Examiner is invited to telephone the undersigned at the number provided. 

Prompt and favorable consideration of the foregoing amendments, and entry of the same 
into the present application, are respectfully requested. 

It is not believed that extensions of time or fees for net addition of claims are required, 
beyond those that may otherwise be provided for in documents accompanying this paper. 
However, in the event that additional extensions of time are necessary to allow consideration of 
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this paper, such extensions are hereby petitioned under 37 CFR § 1.136(a), and any fee required 
therefore (including fees for net addition of claims) is hereby authorized to be charged to Deposit 
Account No. 16-0605. 



ALSTON & BIRD LLP 

P.O. Drawer 34009 

Charlotte, NC 28234 
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Date of Deposit: January 18, 2000 

I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express Mail Post 
Office to Addressee" service under 37 CFR 1. 10 on the date indicated above and is addressed to Box Patent 
Application, Assistant Commissioner for Patents, Washington, DC 20231. 
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Anne Brown 
Registration No. 36,463 
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Compositions and Methods for Non-targeted Activation of 
Endogenous Genes 



CROSS REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of U.S. Application No. 
of John J. Harrington, Bruce Sherf, and Stephen Rundlett, entitled 
"Compositions and Methods for Non-targeted Activation of Endogenous Genes," 
filed March 8, 1999, which is a continuation-in-part of U.S. Application No. 
09/253,022, filed February 19, 1999, which is a continuation-in-part of U.S. 
Application No. 09/159,643, filed September 24, 1998, which is a continuation-in- 
part of U.S. Application No. 08/941,223, filed September 26, 1997, the 
disclosures of all of which are incorporated herein by reference in their entireties. 

BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention is in the fields of molecular biology and cellular 
biology. The invention is directed generally to activation of gene expression or 
causing over-expression of a gene by recombination methods in situ. More 
specifically, the invention is directed to activation of endogenous genes by non- 
targeted integration of specialized activation vectors, which are provided by the 
invention, into the genome of a host cell. The invention also is directed to 
methods for the identification, activation, and isolation of genes that were 
heretofore undiscoverable, and to host cells and vectors comprising such isolated 
genes. The invention also is directed to isolated genes, gene products, nucleic acid 
molecules, and compositions comprising such genes, gene products and nucleic 
acid molecules, that may be used in a variety of therapeutic and diagnostic 
applications. Thus, by the present invention, endogenous genes, including those 
associated with human disease and development, may be identified, activated, and 
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isolated without prior knowledge of the sequence, structure, function, or 
expression profile of the genes. 

Related Art 

Identification and over-expression of novel genes associated with human 

5 disease is an important step towards developing new therapeutic drugs. Current 

approaches to creating libraries of cells for protein over-expression are based on 
the production and cloning of cDNA. Thus, in order to identify a new gene using 
this approach, the gene must be expressed in the cells that were used to make the 
library. The gene also must be expressed at sufficient levels to be adequately 

] 0 represented in the library. This is problematic because many genes are expressed 

only in very low quantities, in a rare population of cells, or during short 

- ■ developmental periods. 

Furthermore, because of the large size of some mRNAs, it is difficult or 
impossible to produce full length cDNA molecules capable of expressing the 

1 5 biologically active protein. Lack of full-length cDNA molecules has also been 

observed for small mRNAs and is thought to be related to sequences in the 
message that are difficult to produce by reverse transcription or that are unstable 
during propagation in bacteria. As a result, even the most complete cDNA 
libraries express only a fraction of the entire set of possible genes. 

20 Finally, many cDNA libraries are produced in bacterial vectors. Use of 

these vectors to express biologically active mammalian proteins is severely limited 
since most mammalian proteins do not fold correctly and/or are improperly 
glycosylated in bacteria. 

Therefore, a method for creating a more representative library for protein 

25 expression, capable of facilitating faithful expression of biologically active 

proteins, would be extremely valuable. 

Current methods for over-expressing proteins involve cloning the gene of 
interest and placing it, in a construct, next to a suitable promoter/enhancer, 
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20 



polyadenylation signal, and splice site, and introducing the construct into an 
appropriate host cell. 

An alternative approach involves the use of homologous recombination to 
activate gene expression by targeting a strong promoter or other regulatory 
sequence to a previously identified gene. 

WO 90/14092 describes in situ modification of genes, in mammalian cells, 
encoding proteins of interest. This application describes single-stranded 
oligonucleotides for site-directed modification of genes encoding proteins of 
interest. A marker may also be included. However, the methods are limited to 
providing an oligonucleotide sequence substantially homologous to a target site. 
Thus, the method requires knowledge of the site required for activation by 
site-directed modification and homologous recombination. Novel genes are not 
discoverable by such methods. 

WO 9 1/06667 describes methods for expressing a mammalian gene in situ. 
With this method, an amplifiable gene is introduced next to a target gene by 
homologous recombination. When the cell is then grown in the appropriate 
medium, both the amplifiable gene and the target gene are amplified and there is 
enhanced expression of the target gene. As above, methods of introducing the 
amplifiable gene are limited to homologous recombination, and are not useful for 
activating novel genes whose sequence (or existence) is unknown. 

WO 91/01140 describes the inactivation of endogenous genes by 
modification of cells by homologous recombination. By these methods, 
homologous recombination is used to modify and inactivate genes and to produce 
cells which can serve as donors in gene therapy. 

WO 92/20808 describes methods for modifying genomic target sites 
in situ. The modifications are described as being small, for example, changing 
single bases in DNA. The method relies upon genomic modification using 
homologous DNA for targeting. 

WO 92/ 19255 describes a method for enhancing the expression of a target 
gene, achieved by homologous recombination in which a DNA sequence is 



_4_ 



integrated into the genome or large genomic fragment. This modified sequence 
can then be transferred to a secondary host for expression. An amplifiable gene 
can be integrated next to the target gene so that the target region can be amplified 
for enhanced expression. Homologous recombination is necessary to this targeted 
approach. 

WO 93/09222 describes methods of making proteins by activating an 
endogenous gene encoding a desired product. A regulatory region is targeted by 
homologous recombination and replacing or disabling the region normally 
associated with the gene whose expression is desired. This disabling or 
replacement causes the gene to be expressed at levels higher than normal. 

WO 94/12650 describes a method for activating expression of and 
amplifying an endogenous gene in situ in a cell, which gene is not expressed or is 
not expressed at desired levels in the ceil. The cell is transfected with exogenous 
DNA sequences which repair, alter, delete, or replace a sequence present in the 
cell or which are regulatory sequences not normally functionally linked to the 
endogenous gene in the cell. In order to do this, DNA sequences homologous to 
genomic DNA sequences at a preselected site are used to target the endogenous 
gene. In addition, amplifiable DNA encoding a selectable marker can be included. 
By culturing the homologously recombinant cells under conditions that select for 
amplification, both the endogenous gene and the amplifiable marker are 
co-amplified and expression of the gene increased. 

WO 95/3 1 560 describes DNA constructs for homologous recombination. 
The constructs include a targeting sequence, a regulatory sequence, an exon, and 
an unpaired splice donor site. The targeting is achieved by homologous 
recombination of the construct with genomic sequences in the cell and allows the 
production of a protein in vitro or in vivo. 

WO 96/294 1 1 describes methods using an exogenous regulatory sequence, 
an exogenous exon, either coding or non-coding, and a splice donor site 
introduced into a preselected site in the genome by homologous recombination. 
In this application, the introduced DNA is positioned so that the transcripts under 
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control of the exogenous regulatory region include both the exogenous exon and 
endogenous exons present in either the thrombopoietin, DNase I, or p-interferon 
genes, resulting in transcripts in which the exogenous and exogenous exons are 
operabiy linked. The novel transcription units are produced by homologous 
recombination. 

U.S. Patent No. 5,272,071 describes the transcriptional activation of 
transcriptionally silent genes in a cell by inserting a DNA regulatory element 
capable of promoting the expression of a gene normally expressed in that cell. 
The regulatory element is inserted so that it is operabiy linked to the normally 
silent gene. The insertion is accomplished by means of homologous recombination 
by creating a DNA construct with a segment of the normally silent gene (the target 
DNA) and the DNA regulatory element used to induce the desired transcription. 

U S. Patent No. 5, 578,461 discusses activating expression of mammalian 
target genes by homologous recombination. A DNA sequence is integrated into 
the genome or a large genomic fragment to enhance the expression of the target 
gene. The modified construct can then be transferred to a secondary host. An 
amplifiable gene can be integrated adjacent to the target gene so that the target 
region is amplified for enhanced expression. 

Both of the above approaches (construction of an over-expressing 
construct by cloning or by homologous recombination in vivo) require the gene 
to be cloned and sequenced before it can be over-expressed. Furthermore, using 
homologous recombination, the genomic sequence and structure must also be 
known. 

Unfortunately, many genes have not yet been identified and/or sequenced. 
Thus, a method for over-expressing a gene of interest, whether or not it has been 
prevtously cloned, and whether or not its sequence and structure are known, 
would be useful. 
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BRIEF SUMMARY OF THE INVENTION 



The invention is, therefore, generally directed to methods for 
over-expressing an endogenous gene in a cell, comprising introducing a vector 
containing a transcriptional regulatory sequence into the cell, allowing the vector 
5 to integrate into the genome of the cell by non-homologous recombination, and 

allowing over-expression of the endogenous gene in the cell. The method does 
not require previous knowledge of the sequence of the endogenous gene or even 
of the existence of the gene. Hence, the invention is directed to non-targeted gene 
activation, which as used herein means the activation of endogenous genes by non- 
1 o targeted or non-homologous (as opposed to targeted or homologous) integration 

of specialized activation vectors into the genome of a host cell. 

The invention also encompasses novel vector constructs for activating 
■ gene expression or over-expressing a gene through non-homologous 
recombination. The novel construct lacks homologous targeting sequences. That 
1 5 is, it does not contain nucleotide sequences that target host cell DNA and promote 

homologous recombination at the target site, causing over-expressing of a cellular 
gene via the introduced transcriptional regulatory sequence. 

Novel vector constructs include a vector containing a transcriptional 
regulatory sequence operably linked to an unpaired splice donor sequence and 
20 further contains one or more amplifiable markers. 

Novel vector constructs include constructs with a transcriptional 
regulatory sequence operably linked to a translational start codon, a signal 
secretion sequence, and an unpaired splice donor site; constructs with a 
transcriptional regulatory sequence, operably linked to a translation start codon, 
25 an epitope tag, and an unpaired splice donor site; constructs containing a 

transcriptional regulatory sequence operably linked to a translational start codon, 
a signal sequence and an epitope tag, and an unpaired splice donor site; constructs 
containing a transcriptional regulatory sequence operably linked to a translation 



start codon, a signal secretion sequence, an epitope tag, and a sequence-specific 
protease site, and an unpaired splice donor site. 

The vector construct can contain one or more selectable markers for 
recombinant host cell selection. Alternatively, selection can be effected by 
phenotypic selection for a trait provided by the activated endogenous gene 
product. 

These vectors, and indeed any of the vectors disclosed herein, and variants 
of the vectors that will be readily recognized by one of ordinary skill in the art, can 
be used in any of the methods described herein to form any of the compositions 
producible by these methods. 

The transcriptional regulatory sequence used in the vector constructs of 
the invention includes, but is not limited to, a promoter. In preferred 
embodiments, the promoter is a viral promoter. In highly preferred embodiments, 
the viral promoter is the cytomegalovirus immediate early promoter. In alternative 
embodiments, the promoter is a cellular, non- viral promoter or inducible 
promoter. 

The transcriptional regulatory sequence used in the vector construct of the 
invention may also include, but is not limited to, an enhancer. In preferred 
embodiments, the enhancer is a viral enhancer. In highly preferred embodiments, 
the viral enhancer is the cytomegalovirus immediate early enhancer. In alternative 
embodiments, the enhancer is a cellular non-viral enhancer. 

In preferred embodiments of the methods described herein, the vector 
construct be, or may contain, linear RNA or DNA. 

The cell containing the vector may be screened for expression of the gene. 

The cell over-expressing the gene can be cultured in vitro under conditions 
favoring the production, by the cell, of desired amounts of the gene product (also 
referred to interchangeably herein as the "expression product") of the endogenous 
gene that has been activated or whose expression has been increased. The 
expression product can then be isolated and purified to use, for example, in protein 
therapy or drug discovery. 



Alternatively, the cell expressing the desired gene product can be allowed 
to express the gene product in vivo. In certain such aspects of the invention, the 
cell containing a vector construct of the invention integrated into its genome may 
be introduced into a eukaryote (such as a vertebrate, particularly a mammal, more 
particularly a human) under conditions favoring the overexpression or activation 
of the gene by the cell in vivo in the eukaryote. In related such aspects of the 
invention, the cell may be isolated and cloned prior to being introduced into the 
eukaryote. 

The invention is also directed to methods for over-expressing an 
endogenous gene in a cell, comprising introducing a vector containing a 
transcriptional regulatory sequence and one or more amplifiable markers into the 
cell, allowing the vector to integrate into the genome of the cell by 
non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector may be screened for over-expression of the 

gene. 

The cell over-expressing the gene is cultured such that amplification of the 
endogenous gene is obtained. The cell can then be cultured in vitro so as to 
produce desired amounts of the gene product of the amplified endogenous gene 
that has been activated or whose expression has been increased. The gene product 
can then be isolated and purified. 

Alternatively, following amplification, the cell can be allowed to express 
the endogenous gene and produce desired amounts of the gene product in vivo. 

It is to be understood, however, that any vector used in the methods 
described herein can include one or more amplifiable markers. Thereby, 
amplification of both the vector and the DNA of interest (i.e., containing the 
over-expressed gene) occurs in the cell, and further enhanced expression of the 
endogenous gene is obtained. Accordingly, methods can include a step in which 
the endogenous gene is amplified. 



The invention is also directed to methods for over-expressing an 
endogenous gene in a cell comprising introducing a vector containing a 
transcriptional regulatory sequence and anunpaired splice donor sequence into the 
cell, allowing the vector to integrate into the genome of the cell by 
non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector may be screened for expression of the gene. 

The cell over-expressing the gene can be cultured in vitro so as to produce 
desirable amounts of the gene product of the endogenous gene whose expression 
has been activated or increased. The gene product can then be isolated and 
purified. 

Alternatively, the cell can be allowed to express the desired gene product 

in vivo. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence and one or more amplifiable markers. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence and the splice donor sequence. 

Any of the vector constructs of the invention can also include a secretion 
signal sequence. The secretion signal sequence is arranged in the construct so that 
it will be operably linked to the activated endogenous protein. Thereby, secretion 
of the protein of interest occurs in the cell, and purification of that protein is 
facilitated. Accordingly, methods can include a step in which the protein 
expression product is secreted from the cell. 

The invention also encompasses cells made by any of the above methods. 
The invention encompasses cells containing the vector constructs, cells in which 
the vector constructs have integrated into the cellular genome, and cells which are 
over-expressing desired gene products from an endogenous gene, over-expression 
being driven by the introduced transcriptional regulatory sequence. 
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The cells can be isolated and cloned. 

The methods can be carried out in any cell of eukaryotic origin, such as 
fungal, plant or animal. In preferred embodiments, the methods of the invention 
may be carried out in vertebrate cells, and particularly mammalian cells including 
but not limited to rat, mouse, bovine, porcine, sheep, goat and human cells, and 
more particularly in human cells. 

A single cell made by the methods described above can over-express a 
single gene or more than one gene. More than one gene in a cell can be activated 
by the integration of a single type of construct into multiple locations in the 
genome. Similarly, more than one gene in a cell can be activated by the 
integration of multiple constructs (i.e., more than one type of construct) into 
multiple locations in the genome. Therefore, a cell can contain only one type of 
vector construct or different types of constructs, each capable of activating an 
endogenous gene. 

The invention is also directed to methods for making the cells described 
above by one or more of the following: introducing one or more of the vector 
constructs of the invention into a cell; allowing the introduced construct(s) to 
integrate into the genome of the cell by non-homologous recombination; allowing 
over-expression of one or more endogenous genes in the cell; and isolating and 
cloning the cell. The invention is also directed to cells produced by such methods, 
which may be isolated cells. 

The invention also encompasses methods for using the cells described 
above to over-express a gene, such as an endogenous cellular gene, that has been 
characterized (for example, sequenced), uncharacterized (for example, a gene 
whose function is known but which has not been cloned or sequenced), or a gene 
whose existence was, prior to over-expression, unknown. The cells can be used 
to produce desired amounts of an expression product in vitro or in vivo. If 
desired, this expression product can then be isolated and purified, for example by 
cell lysis or by isolation from the growth medium (as when the vector contains a 
secretion signal sequence). 
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The invention also encompasses libraries of cells made by the above 
described methods. A library can encompass all of the clones from a single 
transfection experiment or a subset of clones from a single transfection 
experiment. The subset can over-express the same gene or more than one gene, 
5 for example, a class of genes. The transfection can have been done with a single 

construct or with more than one construct. 

A library can also be formed by combining all of the recombinant cells 
from two or more transfection experiments, by combining one or more subsets of 
cells from a single transfection experiment or by combining subsets of cells from 
10 separate transfection experiments. The resulting library can express the same 

gene, or more than one gene, for example, a class of genes. Again, in each of 
these individual transfections, a unique construct or more than one construct can 
be used. 

Libraries can be formed from the same cell type or different cell types. 

1 5 The invention is also directed to methods for making libraries by selecting 

various subsets of cells from the same or different transfection experiments. 

The invention is also directed to methods of using the above-described 
cells or libraries of cells to over-express or activate endogenous genes, or to 
obtain the gene expression products of such over-expressed or activated genes. 

20 According to this aspect of the invention, the cell or library may be screened for 

the expression of the gene and cells that express the desired gene product may be 
selected. The cell can then be used to isolate or purify the gene product for 
subsequent use. Expression in the cell can occur by culturing the cell in vitro, 
under conditions favoring the production of the expression product of the 

25 endogenous gene by the cell, or by allowing the cell to express the gene in vivo. 

In preferred embodiments of the invention, the methods include a process 
wherein the expression product is isolated or purified. In highly preferred 
embodiments, the cells expressing the endogenous gene product are cultured 
under conditions favoring production of sufficient amounts of gene product for 
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commercial application, and especially for diagnostic, therapeutic and drug 
discovery uses. 

Any of the methods can further comprise introducing double-strand breaks 
into the genomic DNA in the cell prior to or simultaneously with vector 
integration. 

The invention also is directed to vector constructs that are useful for 
activating expression of endogenous genes and for isolating the rnRNA and cDNA 
corresponding to the activated genes. 

In one such embodiment, the vector construct may comprise (a) a first 
transcriptional regulatory sequence operably linked to a first unpaired splice donor 
sequence; (b) a second transcriptional regulatory sequence operably linked to a 
second unpaired splice donor sequence; and (c) a linearization site, which may be 
located between the first and second transcriptional regulatory sequences. 
According to the invention, when the vector construct is transformed into a host 
cell and then integrates into the genome of the host cell, the first transcriptional 
regulatory sequence is preferably in an inverted orientation relative to the 
orientation of the second transcriptional regulatory sequence. In certain preferred 
such embodiments, the vector may be rendered linear by cleavage at the 
linearization site. 

In another embodiment, the invention provides a linear vector construct 
having a 3' end and a 5' end, comprising a transcriptional regulatory sequence 
operably linked to an unpaired spliced donor site, wherein the transcriptional 
regulatory sequence is oriented in the linear vector construct in an orientation that 
directs transcription towards the 3' end or the 5 1 end of the linear vector construct. 

In another embodiment, the invention provides a vector construct 
comprising, in sequential order, (a) a transcriptional regulatory sequence, (b) an 
unpaired splice donor site, (c) a rare cutting restriction site, and (d) a linearization 
site. 

In another embodiment, the invention provides a vector construct 
comprising (a) a first transcriptional regulatory sequence operably linked to a 
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selectable marker lacking a polyadenylation signal; and (b) a second transcriptional 
regulatory sequence operably linked to an exon-splice donor site complex, wherein 
the first transcriptional regulatory sequence is in the same orientation in the vector 
construct as is the second transcriptional regulatory sequence, and wherein the 
first transcriptional regulatory sequence is upstream of the second transcriptional 
regulatory sequence in the vector construct. 

In additional embodiments, the invention provides vector constructs 
comprising a transcriptional regulatory sequence operably linked to a selectable 
marker lacking a polyadenylation signal, and further comprising an unpaired splice 
donor site. 

In another embodiment, the invention provides vector constructs 
comprising a first transcriptional regulatory sequence operably linked to a 
selectable marker lacking a polyadenylation signal, and further comprising a 
second transcriptional regulatory sequence operably linked to an unpaired splice 
donor site. 

According to the invention, the transcriptional regulatory sequence (or first 
or second transcriptional regulatory sequence, in vector constructs having more 
than one transcriptional regulatory sequence) may be a promoter, an enhancer, or 
a repressor, and is preferably a promoter, including an animal cell promoter, a 
plant cell promoter, or a fungal cell promoter, most preferably a promoter selected 
from the group consisting of a CMV immediate early gene promoter, an S V40 T 
antigen promoter, and a P-actin promoter. Other promoters of animal, plant, or 
fungal cell origin that may be used in accordance with the invention are known in 
the art and will be familiar to one of ordinary skill in view of the teachings herein. 

The selectable marker used in the vector constructs of the invention may 
be any marker or marker gene that, upon integration of a vector containing the 
selectable marker into the host cell genome, permits the selection of a cell 
containing or expressing the marker gene. Suitable such selectable markers 
include, but are not limited to, a neomycin gene, a hypoxanthine phosphribosyl 
transferase gene, a puromycin gene, a dihydrooratase gene, a glutamine synthetase 
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gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate 
reductase gene, a multidrug resistance 1 gene, an aspartate transcarbamylase gene, 
axanthine-guaninephosphoribosyltramferasegene 3 anadenosinedeaminasegene, 

and a thymidine kinase gene. 

In related embodiments, the invention provides vector constructs 
comprising a positive selectable marker, a negative selectable marker, and an 
unpaired splice donor site, wherein the positive and negative selectable markers 
and the splice donor site are oriented in the vector construct in an orientation that 
results in expression of the positive selectable marker in active form, and either 
non-expression of said negative selectable marker or expression of the negative 
selectable marker in inactive form, when the vector construct is integrated into the 
genome of a eukaryotic host cell and activates an endogenous gene in the genome. 
In certain preferred such embodiments, either the positive selection marker, the 
negative selection marker, or both, may lack a polyadenylation signal. The 
positive selection marker used in these aspects of the invention may be any 
selection marker that, upon expression, produces a protein capable of facilitating 
the isolation of cells expressing the marker, including but not limited to a 
neomycin gene, a hypoxanthine phosphribosyl transferase gene, a puromycin gene, 
a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl 
phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance 1 
gene, an aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl 
transferase gene, or an adenosine deaminase gene. Analogously, the negative 
selection marker used in these aspects of the invention may be any selection 
marker that, upon expression, produces a protein capable of facilitating removal 
of cells expressing the marker, including but not limited to a hypoxanthine 
phosphribosyl transferase gene, a thymidine kinase gene, or a diphtheria toxin 
gene. 

The invention also is directed to eukaryotic host cells, which may be 
isolated host cells, comprising one or more of the vector constructs of the 
invention. Preferred such eukaryotic host cells include, but are not limited to, 
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animal cells (including, but not limited to, mammalian (particularly human) cells, 
insect cells, avian cells, annelid cells, amphibian cells, reptilian cells, and fish cells), 
plant cells, and fungal (particularly yeast) cells. In certain such host cells, the 
vector construct may be integrated into the genome of the host cell. 

The invention also is directed to primer molecules comprising a PCR- 
amplifiable sequence and a degenerate 3' terminus. Primer molecules according 
to this aspect of the invention preferably have the general structure: 

5'-(dT) a -X-N b -TTTATT-3 < , 
wherein a is a whole number from 1 to 100 (preferably from 10 to 30), X is a 
PCR-amplifiable sequence consisting of a nucleic acid sequence of about 10-20 
nucleotides in length, N is any nucleotide, and b is a whole number from 0 to 6. 
One preferred such primer has the nucleotide sequence 5'-TTTTTTTT- 
TTTTCGTCAGCGOCCGCATCNNNNTTTATT-3' (SEQ ID NO: 10). Inrelated 
embodiments, the primer molecules according to this aspect of the invention may 
be biotinylated. 

The invention also is directed to methods for first strand cDNA synthesis 
comprising (a) annealing a first primer of the invention (such as the primer 
described above) to an RNA template molecule to form an first primer-RNA 
complex, and (b) treating this first primer-RNA complex with reverse transcriptase 
and one or more deoxynucleoside triphosphate molecules under conditions 
favoring the reverse transcription of the first primer-RNA complex to synthesize 
a first strand cDNA. 

The invention also is directed to methods for isolating activated genes, 
particularly from a host cell genome. These methods of the invention exploit the 
structure of the raRNA molecules produced using the non-targeted gene activation 
vectors of the invention. One such method of the invention comprises, for 
example, (a) introducing a vector construct comprising a transcriptional regulatory 
sequence and an unpaired splice donor site into a host cell (preferably one of the 
eukaryotic host cells described above), (b) allowing the vector construct to 
integrate into the genome of the host cell by non-homologous recombination, 
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under conditions such that the vector activates an endogenous gene comprising 
an exon in the genome, (c) isolating RNA from the host cell, (d) synthesizing first 
strand cDNA according to the method of the invention described above, 

(e) annealing a second primer specific for the vector-encoded exon to the first 
strand cDNA to create a second primer-first strand cDNA complex, and 

(f) contacting the second primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second strand cDNA 
substantially complementary to the first strand cDNA. Methods according to this 
aspect of the invention may comprise one or more additional steps, such as 
treating the second strand cDNA with a restriction enzyme that cleaves at a 
restriction site located on the vector downstream of the unpaired splice donor site, 
or amplifying the second strand cDNA using a third primer specific for the vector- 
encoded exon and a fourth primer specific for the second primer. The invention 
also is directed to isolated genes produced according to these methods, and to 
vectors (which may be expression vectors) and host cells comprising these isolated 
genes. The invention also is directed to methods of producing a polypeptide, 
comprising cultivating a host cell comprising the isolated gene (or a vector, 
particularly an expression vector, comprising the isolated gene), and culturing the 
host cell under conditions favoring the expression by the host ceil of a polypeptide 
encoded by the isolated gene. The invention also provides additional methods of 
producing a polypeptide, comprising introducing into a host cell a vector 
comprising a transcriptional regulatory sequence operably linked to an exonic 
region followed by an unpaired splice donor site, and culturing the host cell under 
conditions favoring the expression by said host cell of a polypeptide encoded by 
the exonic region, wherein the exon contains a translational start site positioned 
at any of the open reading frame positions relative to the 5'-most base of the 
unpaired splice donor site (e.g., the "A" in the ATG start codon may be at position 
-3 or at an increment of 3 bases upstream therefrom (e.g., -6, -9,-12, -15, -18, 
etc.), at position -2 or at an increment of 3 bases upstream therefrom (e.g., -5, -8, 
-11 ,-14, -17, -20, etc.), or at position -1 or at an increment of 3 bases upstream 
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therefrom (e.g., -4, -7, -10,-13, -16, -19, etc.), relative to the 5-most base of the 
splice donor site). In related embodiments, the methods of the invention may 
further comprise isolating the polypeptide. The invention also is directed to 
polypeptides, which may or may not be isolated polypeptides, produced according 
5 to these methods. 

Other preferred embodiments of the present invention will be apparent to 
one of ordinary skill in light of the following drawings and description of the 
invention, and of the claims. 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1. Schematic diagram of gene activation events described herein. 
The activation construct is transfected into cells and allowed to integrate into the 
host cell chromosomes at DNA breaks. If breakage occurs upstream of a gene of 
interest (e.g., Epo), and the appropriate activation construct integrates at the 
break such that its regulatory sequence becomes operably linked to the gene of 
interest, activation of the gene will occur. Transcription and splicing produce a 
chimeric RNA molecule containing exonic sequences from the activation construct 
and from the endogenous gene. Subsequent translation will result in the 
production of the protein of interest. Following isolation of the recombinant cell, 
gene expression can be further enhanced via gene amplification. 

FIG. 2. Schematic diagram of non-translated activation constructs. The 
arrows denote promoter sequences. The exonic sequences are shown as open 
boxes and the splice donor sequence is indicated by S/D. Construct numbers 
corresponding to the description below are shown on the left. The selectable and 
amplifiable markers are not shown. 

FIG. 3. Schematic diagram of translated activation constructs. The 
arrows denote promoter sequences. The exonic sequences are shown as open 
boxes and the splice donor sequence is indicated by S/D. The translated, signal 
peptide, epitope tag, and protease cleavage sequences are shown in the legend 



-18- 



below the constructs. Construct numbers corresponding to the description below 
are shown on the left. The selectable and amplifiable markers are not shown. 

FIG. 4. Schematic diagram of an activation construct capable of 
activating endogenous genes. 

FIG. 5A-5D. Nucleotide sequence of pRIG8Rl-CD2 (SEQ ID NO:7). 

FIG. 6A-6C. Nucleotide sequence of pRIG8R2-CD2 (SEQ ED NO:8). 

FIG. 7A-7C. Nucleotide sequence of pRIG8R3-CD2 (SEQ ID NO:9). 

FIG. 8A-8F. Examples ofpoly(A) trap vectors. Each vector is illustrated 
schematically in its linearized form. Each horizontal line represents a DNA 
molecule. The arrows denote promoter sequences located on the DNA molecule, 
and face in the direction of transcription. Transcribed regions include all 
sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes and open reading frames are designated by open 
boxes. The following designations were used: splice donor site (SAD), signal 
secretion sequence (SP), epitope tag (ET), neomycin resistance gene (Neo). In 
the vectors depicted in Fig. 8B-8E, it is possible to omit the splice donor site 
immediately downstream of the Neo gene. In vectors lacking a splice donor site 
between the neo gene and the downstream promoter, the Neo transcript will 
utilize the splice donor site located 3' of the downstream promoter. In addition, 
as shown in the vectors depicted in Fig. 8B-8E, a downstream promoter may drive 
expression of an exon. It is recognized that this exon, when present, may encode 
codons in any reading frame. Using multiple vectors, codons in each of the 3 
possible reading frames can be created. 

FIG. 9A-9F. Examples of splice acceptor trap vectors containing a 
positive and a negative selectable marker driven from a single promoter. Each 
vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences located on 
the DNA molecule, and face in the direction of transcription. Transcribed regions 
include all sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes. Poly(A) signals are not present in these examples. 
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As described in the specification, however, poIy(A) signals may be placed on the 
vector 3' of either or both selectable markers. The following designations were 
used: splice donor site (S/D), signal secretion sequence (SP), epitope tag (ET), 
internal ribosome entry site (ires), hypoxanthme phosphoribosyl transferase 
(HPRT), and neomycin resistance gene (Neo). In these examples, Neo represents 
the positive selectable marker and HPRT represents the negative selectable 
marker. In the vectors shown in Fig. 9C and 9F, the region designated exon 
contains a translation start codon. As described in the Detailed Description, the 
exon may encode a methionine residue, a partial signal sequence, a full signal 
secretion sequence, a portion of a protein, or an epitope tag. In addition, the 
codons may be present in any reading frame relative to the splice donor site. In 
other vector examples not shown, the region designated exon lacks a translation 
start codon. 

FIG. 10A-10F. Examples of splice acceptor trap vectors containing a 
positive and negative selectable marker driven from different promoters. Each 
vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences located on 
the DNA molecule, and face in the direction of transcription. Transcribed regions 
include all sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes. Poly(A) signals are not present in these examples. 
As described in the specification, however, poly(A) signals may be placed on the 
vector 3' of either or both selectable markers. The following designations were 
used: splice donor site (S/D), internal ribosome entry site (ires), hypoxanthine 
phosphoribosyl transferase (HPRT), and neomycin resistance gene (Neo). In the 
vectors shown in Figs. 10A-10F, Neo represents the positive selectable marker 
and HPRT represents the negative selectable marker. As shown, the vectors 
depicted in Figs. 10A-10F do not contain a splice donor site 3' of the Neo gene; 
however, in other vectors not shown, a splice donor site may be located 3' of the 
Neo gene to facilitate splicing of the positive selection marker to an endogenous 
exon. In the vectors shown in Fig. IOC and 10F, the region designated exon 
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contains a translation start codon. As described in the Detailed Description, the 
exon may encode a methionine residue, a partial signal sequence, a full signal 
secretion sequence, a portion of a protein, or an epitope tag. In addition, the 
codons may be present in any reading frame relative to the splice donor site. In 
other vector examples not shown, the region designated exon lacks a translation 
start codon. 

FIG. 11A-11C. Schematic diagram of bidirectional activation vectors. 
The arrows denote promoter sequences. The exons are shown as checkered boxes 
and splice donor sites are indicated by S/D. The hatched boxes indicate exon 
sequences operably linked to the upstream promoter. It is understood that the 
exons on these vectors may be untranslated, or may contain a start codon and 
additional codons as described herein. As illustrated in the vectors depicted in 
Fig. 1 IB- 1 1C, the vectors may contain a selectable marker. In these vectors, the 
neomycin resistance (Neo) gene is illustrated. In Fig. 1 IB, a polyadenylation 
signal (pA) is located downstream of the selectable marker. In Fig. 1 1C, 
polyadenylation signals are omitted from the vector. 

FIG. 12A-12G. Examples of vectors useful for recovering exon I from 
activated endogenous genes. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences located on the DNA molecule, and face in the 
direction of transcription. Transcribed regions include all sequences located 
downstream of a promoter. Untranslated regions are designated by hatched 
boxes. Poiy(A) signals are not present in the vectors depicted. As discussed in the 
Detailed Description, however, poly(A) signals may be placed on the vector 3 ' of 
either or both selectable markers. The following designations were used: splice 
donor site (S/D), internal ribosome entry site (ires), hypoxanthine phosphoribosyl 
transferase (HPRT), and neomycin resistance gene (Neo). In these examples, Neo 
represents the positive selectable marker and HPRT represents the negative 
selectable marker. It is also recognized that in these examples, the region 
designated exon, when present, lacks a translation start codon. In other examples 
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not shown, the region designated exon contains a translation start codon. 
Furthermore, when the vector exon contains a translation start codon, the exon 
may encode a methionine residue, a partial signal sequence, a full signal secretion 
sequence, a portion of a protein, or an epitope tag. In addition, the codons may 
be present in each reading frame relative to the splice donor site. 

FIG. 13. Illustration depicting two transcripts produced from the 
integrated vectors described in Figures 12A-12G. DNA strands are depicted as 
horizontal lines. Vector DNA is shown as a black line. Endogenous genomic 
DNA is shown as a grey line. Rectangles depict exons. Vector-encoded exons 
are shown as open rectangles, while endogenous exons are shown as shaded 
boxes. S/D denotes a splice donor site. Following integration, the vector encoded 
promoters activate transcription of the endogenous gene. Transcription resulting 
from the upstream promoter produces a spliced RNA molecule containing the 
vector encoded exon joined to the second and subsequent exons from an 
endogenous gene. Transcription from the downstream promoter, on the other 
hand, produces a transcript containing the sequences downstream of the integrated 
joined to exon I and the subsequent exons from an endogenous gene. 

FIG. 14A-14B. Nucleotide sequence of pRIGl (SEQ ID NO: 18). 
FIG. 15A-15B. Nucleotide sequence of pRIG21b (SEQ ID NO: 19). 
FIG. 16A-16B. Nucleotide sequence of pRIG22b (SEQ ID NO:20). 
FIG. 17A-17G. Examples of poly(A) trap vectors. Each vector is 
illustrated schematically in its linearized form. Each horizontal line represents a 
DNA molecule. The arrows denote promoter sequences located on the DNA 
molecule, and face in the direction of transcription. Transcribed regions include 
all sequences located downstream of a promoter. Boxes indicate exons. Hatched 
boxes indicate untranslated regions. The following designations were used: splice 
donor site (S/D), signal secretion sequence (SP), epitope tag (ET), neomycin 
resistance gene (Neo), vector promoter #1 (VP#l), and vector promoter #2 
(VP#2). As shown in the vectors depicted in Fig. 1 7C- 1 7G, a promoter operably 
linked to an exonand an unpaired splice donor site can be positioned upstream of 
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the selectable marker. It is recognized that this exon, when present, may encode 
codons a start codon in any reading frame relative to the splice donor site. To 
activate protein expression from genes with different reading frames, three 
separate vectors can be used, each with a start codon in a different reading frame 
relative to the splice donor site. 

FIG. 18. Illustration of the transcripts produced by the vector from Fig. 
17C upon integration into a host cell genome upstream of a multi-exon 
endogenous gene. Each horizontal line represents a DNA molecule. Vertical lines 
running through the DNA strand mark the upstream and downstream 
vector/cellular genome boundaries. The arrows denote promoter sequences 
located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of a promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions. The 
endogenous exons are numbered using roman numerals. The following 
designations were used: splice donor site (S/D), neomycin resistance gene (Neo), 
vector promoter #1 (VP#1), vector promoter #2 (VP#2), endogenous promoter 
(EP) and polyadenylation signal (pA). Following integration, vector promoter #1 
expresses a chimeric transcript containing the Neo gene linked to the genomic 
sequences downstream of the integration site, including the processed (spliced) 
exons from the endogenous gene. Since transcript #1 contains a poly (A) signal 
from the endogenous gene, the Neo gene product will be efficiently produced, 
thereby conferring drug resistance on the cell. In addition to transcript #1, the 
integrated vector will generate a second transcript, designated transcript #2, 
originating from vector promoter#2. The structure of transcript #2 facilitates 
efficient translation of the protein encoded by the endogenous gene. As 
exemplified in Figure 17, vectors containing alternative coding information in the 
vector encoded exon can be used to produce different chimeric proteins, 
containing, for example, signal sequences and/or epitope tags. 

FIG. 19. Example of dual positive selectable marker vector. The vector 
is illustrated schematically in its linearized form. The horizontal line represents a 
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DNA molecule. The arrows denote promoter sequences located on the DNA 
molecule, and face in the direction of transcription. Transcribed regions include 
all sequences located downstream of a promoter. Boxes indicate exons. Hatched 
boxes indicate untranslated regions. Poly(A) signals are not present in these 
5 examples. The following designations were used: splice donor site (S/D), 

hygromycin resistance gene (Hyg), neomycin resistance gene (Neo), vector 
promoter #1, and vector promoter #2. 

FIG. 20A-20B. Examples of transcripts produced by a dual positive 
selectable marker vector integrated into a host cell genome adjacent to an 

10 endogenous gene. Figure 20A illustrates the transcripts produced upon vector 

integration near a multi-exon gene. Figure 20B illustrates the transcripts produced 
upon vector integration near a single exon gene. Each horizontal line represents 
a DNA molecule. Vertical lines running through the DNA strand mark the 
upstream and downstream vector/cellular genome boundaries. The arrows denote 

15 promoter sequences located on the DNA molecule, and face in the direction of 

transcription. Transcribed regions include all sequences located downstream of 
each promoter Boxes indicate exons. Hatched boxes indicate untranslated 
regions. The endogenous exons are numbered using roman numerals. The 
following designations were used : splice donor site (S/D), hygromycin resistance 

20 gene (Hyg), neomycin resistance gene (Neo), vector promoter # 1 (VP# 1 ), vector 

promoter #2 (VP#2), endogenous promoter (EP), and polyadenylation signal 
(pA). Following integration, vector promoter #1 expresses a chimeric transcript 
containing the Hyg gene linked to the genomic sequences downstream of the 
integration site, including the processed (spliced) exons from the endogenous 

25 gene. Since transcript #1 contains a poly (A) signal from the endogenous gene, 

the Hyg gene product will be efficiently produced, thereby conferring drug 
resistance on the cell. In addition to transcript #1, the integrated vector will 
generate a second transcript, designated transcript #2, originating from vector 
promoter#2. In figure 20A, the neo gene is removed from transcript #2 upon 

30 splicing from the vector encoded splice donor site, and the first endogenous splice 



acceptor located downstream of the vector integration site (i.e. exon II in this 
example). Since multi-exon genes contain splice acceptor sites at the 5' end of 
each exon (except exon I), the neo gene will be removed from transcript #2 in 
cells in which the vector has integrated near, and transcriptionally activated, a 
multi-exon gene. As a result, cells having activated multi-exon genes may be 
eliminated by selecting with G418 and hygromycin. In figure 20B, the neo gene 
is not removed from transcript #2 by splicing, since single exon genes do not 
contain any splice acceptor sequences. Thus, cells containing a vector integrated 
near single exon genes will survive double selection with G418 and hygromycin. 
These cells can be used to efficiently isolate the activated single exon genes using 
methods described herein. 

FIG. 21 A-21B. Examples of dual trap vectors containing a positive and 
a negative selectable marker. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences located on the DNA molecule, and face in the 
direction of transcription. Transcribed regions include all sequences located 
downstream of a promoter. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. The following designations were used: splice donor site 
(S/D), hypoxanthine phosphoribosyl transferase (HPRT), neomycin resistance 
gene (Neo), vector promoter #1 (VP #1), vector promoter #2 (VP#2), and vector 
promoter #3 (VP#3). In the vectors shown in Figs. 21 A-21B, Neo represents the 
positive selectable marker and HPRT represents the negative selectable marker. 
In re 21B a third promoter is located upstream of the selectable markers. This 
upstream promoter is operably linked to an exon and unpaired splice donor site. 
Fig, The region designated exon contains a translation start codon in this example. 
As described herein, the exon may encode a methionine residue, a partial signal 
sequence, a full signal secretion sequence, a portion of a protein, or an epitope 
tag. In addition, the codons may be present in any reading frame relative to the 
splice donor site. In other vector examples not shown, the region designated exon 
lacks a translation start codon. 
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FIG. 22. Examples of transcripts produced by a dual positive/negative 
selectable marker vector integrated into a host cell genome upstream of a multi- 
exon endogenous gene. Each horizontal line represents a DNA molecule. Vertical 
lines running through the DNA strand mark the upstream and downstream 
vector/cellular genome boundaries. The arrows denote promoter sequences 
located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of each promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions. The 
endogenous exons are numbered using roman numerals. The following 
designations were used: splice donor site (S/D), neomycin resistance gene (Neo), 
vector promoter #1 (VP#1), vector promoter #2 (VP#2) ; vector promoter #3 
(VP#3), polyadenylation signal (pA), and endogenous promoter (EP). Following 
integration, vector promoter # 1 expresses a chimeric transcript containing the Neo 
gene linked to the genomic sequences downstream of the integration site, 
including the processed (spliced) exons from the endogenous gene. Since 
transcript # I contains a poly (A) signal from the endogenous gene, the Neo gene 
product will be efficiently produced, thereby conferring drug resistance on the cell. 
In addition to transcript #1 , the integrated vector will generate a second transcript, 
designated transcript #2, originating from vector promoter #2. In this example, 
the vector has integrated upstream of a multi-exon gene. Since multi exon genes 
contain splice acceptor sites at the 5' end of each exon, the HPRT gene will be 
removed from transcript #2 in cells in which the vector has integrated near, and 
transcriptionally activated, a multi-exon gene. As a result, cells containing 
activated multi-exon genes may be isolated by selecting with G418 and 8- 
Azaguanine 6-Thioguanine (AgThg). Thus, cells containing a vector integrated 
near single exon genes will survive double selection with G41 8 and AgThg. These 
cells can be used to efficiently isolate the activated multi-exon genes using 
methods described herein. In addition to transcripts #1 and #2, a third transcript, 
designated transcript #3 is produced from the integrated vector. Transcript #3, 
originating from vector promoter #3, contains an exonic sequence suitable for 
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directing protein expression from the endogenous gene. This occurs following 
splicing from the first splice donor site downstream of promoter #3 to the first 
downstream splice acceptor site from the endogenous gene. In addition to 
directing protein expression, transcript #3, and/or transcripts #1 and/or #2, can 
be isolated for gene discovery purposes using the methods described herein. 

FIG. 23A-23D. Example of a multi-Promoter/Activation Exon Vector. 
Each vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences. Boxes 
indicate exons. Hatched boxes indicate untranslated regions. It is understood that 
the exons on these vectors may be untranslated, or may contain a start codon and 
additional codons as described herein The following designations were used: 
splice donor site (S/D), vector promoter #1 (VP #1), vector promoter #2 (VP#2), 
vector promoter #3 (VP #3), and vector promoter #4 (VP#4). Individual vector 
activation exons are designated A, B, C, and D. Each activation exon may contain 
a different structure. The structure of each activation exon and its flanking intron 
are shown below. It is understood, however, that any activation exon described 
herein, may be used on these vectors, in any combination and/or order, including 
exons that encode signal sequences, partial signal sequences, epitope tags, 
proteins, portions of proteins, and protein motifs. Any of the exons may lack a 
start codon. In addition, while not illustrated in these examples, these vectors may 
contain a selectable marker and/or an amplifiable marker. The selectable marker 
may contain a poly (A) signal or a splice donor site. When present, the splice 
donor site may be located upstream or downstream of the selectable marker. 
Alternatively, the selectable marker may not be operably linked to a poly (A) 
signal and/or a splice donor site. 

FIG. 24. Examples of transcripts produced from a multi- 
Promoter/ Activation Exon Vector upon integration into a host cell genome 
upstream of an endogenous gene. Each horizontal line represents a DNA 
molecule. Vertical lines running through the DNA strand mark the upstream and 
downstream vector/cellular genome boundaries. The arrows denote promoter 
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sequences located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of each promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions.. The 
endogenous exons are numbered using roman numerals. The following 
designations were used: splice donor site (S/D), vector promoter #1 (VP #1), 
vector promoter #2 (VP#2), vector promoter #3 (VP #3), vector promoter #4 
(VP#4), endogenous promoter (EP), and polyadenylation signal (pA). Individual 
vector activation exons are designated A, B, C, and D.. Following integration, 
each vector encoded promoter is capable of producing a different transcript. Each 
transcript contains a different activation exon joined to the first downstream splice 
acceptor site from an endogenous gene (exon II in this example). Individual 
activation exons are designated by (A), (B), (C), or (D). Endogenous exons are 
designated by (I), (II), (III), or (IV). Generally, the coding sequence and/or 
reading frames, if present, are different among the activation exons. While four 
activation exons are illustrated in this example, any number of activation exons 
may be present on the integrated vector. 

FIG. 25A-25D. Examples of activation vectors useful for detection of 
protein-protein interactions. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. The following designations were used: splice donor site 
(S/D), neomycin resistance gene (Neo). It is also recognized that the DNA 
binding domain and the Activation domain may be encoded in any reading frame 
(relative to the splice donor site), allowing activation of endogenous genes with 
different reading frames. 

FIG. 26. Schematic illustration depicting one approach to detecting 
protein-protein interactions using the vectors shown in Figure 25 . Each horizontal 
line represents a DNA molecule. Vertical lines running through the DNA strand 
mark the upstream and downstream vector/cellular genome boundaries. The 
arrows denote promoter sequences located on the DNA molecule, and face in the 
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direction of transcription. Transcribed regions include all sequences located 
downstream of each promoter. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. . The endogenous exons are numbered using roman numerals. 
The following designations were used: splice donor site (S/D), binding domain 
5 (BD), activation domain (AD), recognition sequence (RS), and polyadenylation 

signal (pA). The binding domain vector is shown integrated into the genome of 
a host cell, upstream of an endogenous gene, designated gene A. The activation 
domain vector is shown integrated into the genome of the same host cell upstream 
of an endogenous gene, designated gene B. Both vectors are integrated into the 

10 genome of the same host cell. Following integration, each vector is capable of 

producing a fusion protein containing the binding domain (or activation domain, 
as the case may be) and the protein encoded by the downstream endogenous gene. 
If the binding domain fusion protein interacts with the activation domain fusion 
protein, a protein complex will be formed. This complex is capable of increasing 

1 5 expression of a reporter gene present in the cell. 

FIG. 27. Examples of activation vectors useful for in vitro and in vivo 
transposition. Each vector is illustrated schematically in its linearized form. Each 
horizontal line represents a DNA molecule. The arrows denote promoter 
sequences. Boxes indicate exons. Hatched boxes indicate untranslated regions. 

20 The solid boxes indicate the transposon signals. It is recognized that there is 

directionality to the transposon signals, and that the signals are oriented in the 
configuration suitable for the type of transposition reaction (integration, inversion, 
or deletion). The following designations were used: splice donor site (S/D), 
neomycin resistance gene (Neo), dihydrofolate reductase (DHFR), puromycin 

25 resistance gene (Puro), poly (A) signal (pA), and the Epstein Barr Virus origin of 

replication (ori P). It is also recognized that activation exon may be encode amino 
acids in any reading frame (relative to the splice donor site), allowing activation 
of endogenous genes with different reading frames. 

FIG. 28. Schematic illustration depicting integration of an activation 

30 vector into a cloned genomic DNA fragment by in vitro transposition. Each 
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horizontal line represents a DNA molecule. The cloned genomic DNA is in a 
BAC vector. The single line represents the genomic DNA and the rectangle 
depicts the BAC vector sequences. The arrows denote promoter sequences 
located on the DNA molecule, and face in the direction of transcription. 
5 Transcribed regions include all sequences located downstream of each promoter. 

The vector activation exon is depicted as an open box. Exons from a gene 
encoded in the cloned genomic fragment are depicted as hatched boxes. The solid 
boxes indicate the transposon signals. It is recognized that there is directionality 
to the transposon signals, and that the signals are oriented in the configuration 
1 0 suitable for the type of transposition reaction (integration, inversion, or deletion). 

The following designations were used: splice donor site (S/D), and 
polyadenylation signal (pA). To integrate the vector into the genomic fragment, 
the activation vector is incubated with the cloned genomic DNA in the presence 
of transposase. Following integration of the activation vector into the genomic 
1 5 fragment, the plasmid may be transfected directly into an appropriate eukaryotic 

host cell to express the gene located downstream of the vector integration site. 
Alternatively, the BAC plasmid may be transformed into E. coli to produce larger 
quantities of plasmid for transfection into the appropriate eukaryotic host cell. 

FIG. 29A-29B. Nucleotide sequence of pRIG14. 
20 FIG- 30A-30C. Nucleotide sequence of pRIGl 9. 

FIG. 31A-31C. Nucleotide sequence of pRIG20. 

FIG. 32A-32C. Nucleotide sequence of pRIGadl. 

FIG. 33A-33D. Nucleotide sequence of pRIGbdl. 

FIG. 34A-34B. Nucleotide sequence of pUniBAC. 
25 FIG. 35A-35B. Nucleotide sequence of pRIG22. 

FIG. 36. Schematic diagram of pRIG-TP. The vector is shown in its 
linearized form. The horizontal line represents a DNA molecule. The arrows 
denote promoters. Open boxes indicate exons. Filled boxes represent transposon 
recombination signals (from Tn5 - compatible with the in vitro transposition kit 
30 available from Epicentre Technologies). The following designations were used: 
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splice donor site (S/D), puromycin resistance gene (puro), dihydrofolate reductase 
gene (DHFR), Epstein Barr nuclear antigen - 1 replication protein (EBNA-1), 
Epstein Barr virus origin of replication (ori P), poly (A) signal (p A), and activation 
exon (AE). It is understood that the activation exon can contain any sequence 
capable of directing protein synthesis, including a translation start codon in any 
reading frame, a partial secretion signal sequence, an entire secretion signal 
sequence, an epitope tag, a protein, a portion of a protein, or a protein motif. The 
activation exon may also lack a translation start codon. 

FIG. 37A-37C. Nucleotide sequence of pRIG-T. 

DETAILED DESCRIPTION OF THE INVENTION 

There are great advantages to gene activation by non-homologous 
recombination over other gene activation procedures. Unlike previous methods 
of protein over-expression, the methods described herein do not require that the 
gene of interest be cloned (isolated from the cell). Nor do they require any 
knowledge of the DNA sequence or structure of the gene to be over-expressed 
(i.e., the sequence of the ORF, introns, exons, or upstream and downstream 
regulatory elements) or knowledge of a gene's expression patterns (i.e., tissue 
specificity, developmental regulation, etc.). Furthermore, the methods do not 
require any knowledge pertaining to the genomic organization of the gene of 
interest (i.e., the intron and exon structure). 

The methods of the present invention thus involve vector constructs that 
do not contain target nucleotide sequences for homologous recombination. A 
target sequence allows homologous recombination of vector DNA with cellular 
DNA at a predetermined site on the cellular DNA the site having homology for 
sequences in the vector, the homologous recombination at the predetermined site 
resulting in the introduction of the transcriptional regulatory sequence into the 
genome and the subsequent endogenous gene activation. 
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The method of the present invention does not involve integration of the 
vector at predetermined sites. Instead, the present methods involve integration of 
the vector constructs of the invention into cellular DNA (e.g. , the cellular genome) 
by nonhomologous or "illegitimate" recombination, also called "non-targeted gene 

5 activation." In related embodiments, the present invention also concerns non- 

targeted gene activation. Non-targeted gene activation has a number of important 
applications. First, by activating genes that are not normally expressed in a given 
cell type, it becomes possible to isolate a cDNA copy of genes independent of 
their normal expression pattern. This facilitates isolation of genes that are 

10 normally expressed in rare cells, during short developmental periods, and/or at 

very low levels. Second, by translationally activating genes, it is possible to 
produce protein expression libraries without the need for cloning the full-length 
cDNA. These libraries can be screened for new enzymes and proteins and/or for 
interesting phenotypes resulting from over-expression of an endogenous gene. 

15 Third, cell-lines over-expressing a specific protein can be created and used to 

produce commercial quantities of protein. Thus, activating endogenous genes 
provides a powerful approach to discovering and isolating new genes and proteins, 
and to producing large amounts of specific proteins for commercialization. 

The vectors described herein do not contain target sequences. A target 

20 sequence is a sequence on the vector that has homology with a sequence or 

sequences within the gene to be activated or upstream of the gene to be activated, 
the upstream region being up to and including the first functional splice acceptor 
site on the same coding strand of the gene of interest, and by means of which 
homology the transcriptional regulatory sequence that activates the gene of 

25 interest is integrated into the genome of the cell containing the gene to be 

activated. In the case of an enhancer integration vector for activating an 
endogenous gene, the vector does not contain homology to any sequence in the 
genome upstream or downstream of the gene of interest (or within the gene of 
interest) for a distance extending as far as enhancer function is operative. 
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The present methods, therefore, are capable of identifying new genes that 
have been or can be missed using conventional and currently available cloning 
techniques. By using the constructs and methodology described herein, unknown 
and/or uncharacterized genes can be rapidly identified and over-expressed to 
5 produce proteins. The proteins have use as, among other things, human 

therapeutics and diagnostics and as targets for drug discovery. 

The methods are also capable of producing over-expression of known 
and/or characterized genes for in vitro or in vivo protein production. 

A "known" gene is directed to the level of characterization of a gene. The 

10 invention allows expression of genes that have been characterized, as well as 

expression of genes that have not been characterized. Different levels of 
characterization are possible. These include detailed characterization, such as 
cloning, DNA, RNA, and/or protein sequencing, and relating the regulation and 
function of the gene to the cloned sequence (e.g., recognition of promoter and 

15 enhancer sequences, functions of the open reading frames, introns, and the like). 

Characterization can be less detailed, such as having mapped a gene and related 
function, or having a partial amino acid or nucleotide sequence, or having purified 
a protein and ascertained a function. Characterization may be minimal, as when 
a nucleotide or amino acid sequence is known or a protein has been isolated but 

20 the function is unknown. Alternatively, a function may be known but the 

associated protein or nucleotide sequence is not known or is known but has not 
been correlated to the function. Finally, there may be no characterization in that 
both the existence of the gene and its function are not known. The invention 
allows expression of any gene at any of these or other specific degrees of 

25 characterization. 

Many different proteins (also referred to herein interchangeably as "gene 
products" or "expression products") can be activated or over-expressed by a single 
activation construct and in a single set of transfections. Thus, a single cell or 
different cells in a set of transfectants (library) can over-express more than one 

30 protein following transfection with the same or different constructs. Previous 
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activation methods require a unique construct to be created for each gene to be 
activated. 

Further, many different integration sites adjacent to a single gene can be 
created and tested simultaneously using a single construct. This allows rapid 
determination of the optimal genomic location of the activation construct for 
protein expression. 

Using previous methods, the 5' end of the gene of interest had to be 
extensively characterized with respect to sequence and structure. For each 
activation construct to be produced, an appropriate targeting sequence had to be 
isolated. Usually, this must be an isogenic sequence isolated from the same person 
or laboratory strain of animal as the cells to be activated. In some cases, this DN A 
may be 50 kb or more from the gene of interest. Thus, production of each 
targeting construct required an arduous amount of cloning and sequencing of the 
endogenous gene. However, since sequence and structure information is not 
required for the methods of the present invention, unknown genes and genes with 
uncharacterized upstream regions can be activated. 

This is made possible using in situ gene activation using non- homologous 
recombination of exogenous DNA sequences with cellular DNA. Methods and 
compositions (e.g., vector constructs) required to accomplish such in situ gene 
activation using non- homologous recombination are provided by the present 
invention. 

DNA molecules can recombine to redistribute their genetic content by 
several different and distinct mechanisms, including homologous recombination, 
site-specific recombination, and non-homologous/illegitimate recombination. 
Homologous recombination involves recombination between stretches of DNA 
that are highly similar in sequence. It has been demonstrated that homologous 
recombination involves pairing between the homologous sequences along their 
length prior to redistribution of the genetic material. The exact site of crossover 
can be at any point in the homologous segments. The efficiency of recombination 
is proportional to the length of homologous targeting sequence (Hope, 
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Development / 73: 3 99 (1991); Reddyet ai, J. Virol 65: 1507 (1991)), the degree 
of sequence identity between the two recombining sequences (von Melchner et al, 
GenesDev. 6:919(1992)), and the ratio of homologous to non-homologous DNA 
present on the construct (Letson, Genetics 117:159 (1987)). 

Site-specific recombination, on the other hand, involves the exchange of 
genetic material at a predetermined site, designated by specific DNA sequences. 
In this reaction, a protein recombinase binds to the recombination signal 
sequences, creates a strand scission, and facilitates DNA strand exchange. 
Cre/Lox recombination is an example of site specific recombination. 

Non-homologous/illegitimate recombination, such as that used 
advantageously by the methods of the present invention, involves the joining 
(exchange or redistribution) of genetic material that does not share significant 
sequence homology and does not occur at site-specific recombination sequences. 
Examples of non-homologous recombination include integration of exogenous 
DNA into chromosomes at non-homologous sites, chromosomal translocations 
and deletions, DNA end-joining, double strand break repair of chromosome ends, 
bridge-breakage fusion, and concatemerization of transfected sequences. In most 
cases, non-homologous recombination is thought to occur through the joining of 
"free DNA ends." Free ends are DNA molecules that contain an end capable of 
being joined to a second DNA end either directly, or following repair or 
processing. The DNA end may consist of a 5' overhang, 3' overhang, or blunt 
end. 

As used herein, retroviral insertion and other transposition reactions are 
loosely considered forms of non-homologous recombination. These reactions do 
not involve the use of homology between the recombining molecules. 
Furthermore, unlike site-specific recombination, these types of recombination 
reactions do not occur between discrete sites. Instead, a specific protein/DNA 
complex is required on only one of the recombination partners (i.e., the retrovirus 
or transposon), with the second DNA partner (i.e., the cellular genome) usually 
being relatively non-specific. As a result, these "vectors" do not integrate into the 
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cellular genome in a targeted fashion, and therefore they can be used to deliver the 
activation construct according to the present invention. 

Vector constructs useful for the methods described herein ideally may 
contain a transcriptional regulatory sequence that undergoes non-homologous 
recombination with genomic sequences in a cell to over-express an endogenous 
gene in that cell. The vector constructs of the invention also lack homologous 
targeting sequences. That is, they do not contain DNA sequences that target host 
cell DNA and promote homologous recombination at the target site. Thus, 
integration of the vector constructs of the present invention into the cellular 
genome occurs by non-homologous recombination, and can lead to 
over-expression of a cellular gene via the introduced transcriptional regulatory 
sequence contained on the integrated vector construct. 

The invention is generally directed to methods for over-expressing an 
endogenous gene in a cell, comprising introducing a vector containing a 
transcriptional regulatory sequence into the cell, allowing the vector to integrate 
into the genome of the cell by non-homologous recombination, and allowing 
over-expression of the endogenous gene in the cell. The method does not require 
previous knowledge of the sequence of the endogenous gene or even of the 
existence of the gene. Where the sequence of the gene to be activated is known, 
however, the constructs can be engineered to contain the proper configuration of 
vector elements (e.g., location of the start codon, addition of codons present in the 
first exon of the endogenous gene, and the proper reading frame) to achieve 
maximal overexpression and/or the appropriate protein sequence. 

In certain embodiments of the invention, the cell containing the vector may 
be screened for expression of the gene. 

The cell over-expressing the gene can be cultured in vitro under conditions 
favoring the production, by the cell, of desired amounts of the gene product of the 
endogenous gene that has been activated or whose expression has been increased. 
If desired, the gene product can then be isolated or purified to use, for example, 
in protein therapy or drug discovery. 
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Alternatively, the cell expressing the desired gene product can be allowed 
to express the gene product in vivo. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence. 

Alternatively, the vector construct can consist essentially of the 
transcriptional regulatory sequence and one or more amplifiable markers. 

The invention, therefore, is also directed to methods for over-expressing 
an endogenous gene in a cell, comprising introducing a vector containing a 
transcriptional regulatory sequence and an amplifiable marker into the cell, 
allowing the vector to integrate into the genome of the cell by non-homologous 
recombination, and allowing over-expression of the endogenous gene in the cell. 

The cell containing the vector is screened for over-expression of the gene. 

The cell over-expressing the gene is cultured such that amplification of the 
endogenous gene is obtained. The cell can then be cultured in vitro so as to 
produce desired amounts of the gene product of the amplified endogenous gene 
that has been activated or whose expression has been increased. The gene product 
can then be isolated and purified. 

Alternatively, following amplification, the cell can be allowed to express 
the endogenous gene and produce desired amounts of the gene product in vivo. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence and the splice donor sequence. 

The invention, therefore, is also directed to methods for over-expressing 
an endogenous gene in a cell comprising introducing a vector containing a 
transcriptional regulatory sequence and an unpaired splice donor sequence into the 
cell, allowing the vector to integrate into the genome of the cell by 
non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector is screened for expression of the gene. 
The cell over-expressing the gene can be cultured in vitro so as to produce 
desirable amounts of the gene product of the endogenous gene whose expression 
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has been activated or increased. The gene product can then be isolated and 
purified. 

Alternatively, the cell can be allowed to express the desired gene product 

in vivo. 

The vector construct can consist essentially of a transcriptional regulatory 
sequence operably linked to an unpaired splice donor sequence and also containing 
an amplifiable marker. 

Other activation vectors include constructs with a transcriptional 
regulatory sequence and an exonic sequence containing a start codon; a 
transcriptional regulatory sequence and an exonic sequence containing a 
translational start codon and a secretion signal sequence; constructs with a 
transcriptional regulatory sequence and an exonic sequence containing a 
translation start codon, and an epitope tag; constructs containing a transcriptional 
regulatory sequence and an exonic sequence containing a translational start codon, 
a signal sequence and an epitope tag; constructs containing a transcriptional 
regulatory sequence and an exonic sequence with a translation start codon, a 
signal secretion sequence, an epitope tag, and a sequence-specific protease site. 
In each of the above constructs, the exon on the construct is located immediately 
upstream of an unpaired splice donor site. 

The constructs can also contain a regulatory sequence, a selectable marker 
lacking a poly(A) signal, an internal ribosome entry site (ires), and an unpaired 
splice donor site (FIG. 4). A start codon, signal secretion sequence, epitope tag, 
and/or a protease cleavage site may optionally be included between the ires and 
the unpaired splice donor sequence. When this construct integrates upstream of 
a gene, the selectable marker will be efficiently expressed since a poly(A) site will 
be supplied by the endogenous gene. In addition the downstream gene will also 
be expressed since the ires will allow protein translation to initiate at the 
downstream open reading frame (i.e. the endogenous gene). Thus, the message 
produced by this activation construct will be polycistronic. The advantage of this 
construct is that integration events that do not occur near genes and in the proper 
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orientation will not produce a drug resistant colony. The reason for this is that 
without a poly(A) tail (supplied by the endogenous gene), the neomycin resistance 
gene will not express efficiently. By reducing the number of nonproductive 
integration events, the complexity of the library can be reduced without affecting 
its coverage (the number of genes activated), and this will facilitate the screening 
process. 

In another embodiment of this construct, cre-lox recombination sequences 
can be included between the regulatory sequence and the mo start codon and 
between the ires and the unpaired splice donor site (between the ires and the start 
codon, if present). Following isolation of cells that have activated the gene of 
interest, the neo gene and ires can be removed by transfecting the cells with a 
plasmid encoding the ere recombinase. This would eliminate the production of the 
polycistronic message and allow the endogenous gene to be expressed directly 
from the regulatory sequence on the integrated activation construct. Use of Cre 
recombination to facilitate deletion of genetic elements from mammalian 
chromosomes has been described (Gu et al, Science 265:103 (1994); Sauer, 
Meth. Enzymology 225:890-900 (1993)). 

Thus, constructs useful in the methods described herein include, but are 
not limited to, the following (See also Figures 1-4): 

1) Construct with a regulatory sequence and an exon lacking a translation 
start codon. 

2) Construct with a regulatory sequence and an exon lacking a translation 
start codon followed by a splice donor site. 

3) Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 1 (relative to the splice donor site), followed 
by an unpaired splice donor site. 

4) Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 2 (relative to the splice donor site), followed 
by an unpaired splice donor site. 
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Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 3 (relative to the splice donor site), followed 
by an unpaired splice donor site. 

Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 1 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 2 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 3 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 1 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 2 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 3 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 1 (relative to the splice donor site), followed by an 
unpaired splice donor site. 

Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 2 (relative to the splice donor site), followed by an 
unpaired splice donor site. 
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14) Construct with a regulatory sequence and an exon containing (from 5 ' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 3 (relative to the splice donor site), followed by an 
unpaired splice donor site. 

15) Construct with a regulatory sequence and an exon containing (from 5 ' to 
3') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 1 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

16) Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 2 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

1 7) Construct with a regulatory sequence and an exon containing (from 5 ' to 
3') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 3 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

18) Construct with a regulatory sequence linked to a selectable marker, 
followed by an internal ribosome entry site, and an unpaired splice donor 
site. 

19) Construct 1 8 in which a cre/lox recombination signal is located between 
a) the regulatory sequence and the open reading frame of the selectable 
marker and b) between the ires and the unpaired splice donor site. 

20) . Construct with a regulatory sequence operably linked to an exon 

containing green fluorescent protein lacking a stop codon, followed by an 
unpaired splice donor site. 

It is to be understood, however, that any vector used in the methods 
described herein can include one or more (i.e., one, two, three, four, five, or more, 
and most preferably one or two) amplifiable markers. Accordingly, methods can 
include a step in which the endogenous gene is amplified. Placement of one or 
more amplifiable markers on the activation construct results in the juxtaposition 
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of the gene of interest and the one or more amplifiable markers in the activated 
cell. Once the activated cell has been isolated, expression can be further increased 
by selecting for cells containing an increased copy number of the locus containing 
both the gene of interest and the activation construct. This can be accomplished 
by selection methods known in the art, for example by culturing cells in selective 
culture media containing one or more selection agents that are specific for the one 
or more amplifiable markers contained on the genetic construct or vector. 

Following activation of an endogenous gene by nonhomologous 
integration of any of the vectors described above, the expression of the 
endogenous gene may be further increased by selecting for increased copies of the 
amplifiable marker(s) located on the integrated vector. While such an approach 
may be accomplished using one amplifiable marker on the integrated vector, in an 
alternative embodiment the invention provides such methods wherein two or more 
(i.e., two, three, four, five, or more, and most preferably two) amplifiable markers 
may be included on the vector to facilitate more efficient selection of cells that 
have amplified the vector and flanking gene of interest. This approach is 
particularly useful in cells that have a functional endogenous copy of one or more 
of the amplifiable marker(s) that are contained on the vector, since the selection 
procedure can result in isolation of cells that have incorrectly amplified the 
endogenous amplifiable marker(s) rather than the vector-encoded amplifiable 
marker(s). This approach is also useful to select against cells that develop 
resistance to the selective agent by mechanisms that do not involve gene 
amplification. The approach using two or more amplifiable markers is 
advantageous in these situations because the probability of a cell developing 
resistance to two or more selective agents (resistance to which is encoded by two 
or more amplifiable markers) without amplifying the integrated vector and flanking 
gene of interest is significantly lower than the probability of the cell developing 
resistance to any single selective agent. Thus, by selecting for two or more vector 
encoded amplifiable markers, either simultaneously or sequentially, a greater 
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percentage of cells that are ultimately isolated will contain the amplified vector and 
gene of interest. 

Thus, in another embodiment, the vectors ofthe invention may contain two 
or more {i.e., two, three, four, five, or more, and most preferably two) amplifiable 
markers. This approach allows more efficient amplification of the vector 
sequences and adjacent gene of interest following activation of expression. 

Examples of amplifiable markers that may be used constructing the present 
vectors include, but are not limited to, dihydrofolate reductase, adenosine 
deaminase, aspartate transcarbamylase, dihydro-orotase, and carbamyl phosphate 
synthase. 

It is also understood that any of the constructs described herein may 
contain a eukaryotic viral origin of replication, either in place of, or in conjunction 
with an amplifiable marker. The presence of the viral origin of replication allows 
the integrated vector and adjacent endogenous gene to be isolated as an episome 
and/or amplified to high copy number upon introduction of the appropriate viral 
replication protein. Examples of useful viral origins include, but are not limited 
to, SV40 ori and EBV ori P. 

The invention also encompasses embodiments in which the constructs 
disclosed herein consist essentially ofthe components specifically described for 
these constructs. It is also understood that the above constructs are examples of 
constructs useful in the methods described herein, but that the invention 
encompasses functional equivalents of such constructs. 

The term "vector" is understood to generally refer to the vehicle by which 
the nucleotide sequence is introduced into the cell. It is not intended to be limited 
to any specific sequence. The vector could itself be the nucleotide sequence that 
activates the endogenous gene or could contain the sequence that activates the 
endogenous gene. Thus, the vector could be simply a linear or circular 
polynucleotide containing essentially only those sequences necessary for 
activation, or could be these sequences in a larger polynucleotide or other 
construct such as a DNA or RN A viral genome, a whole virion, or other biological 
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construct used to introduce the critical nucleotide sequences into a cell. It is also 
understood that the phrase "vector construct" or the term "construct" may be used 
interchangeably with the term "vector" herein. 

The vector can contain DNA sequences that exist in nature or that have 
been created by genetic engineering or synthetic processes. 

The construct, upon nonhomologous integration into the genome of a cell, 
can activate expression of an endogenous gene. Expression of the endogenous 
gene may result in production of full length protein, or in production of a 
truncated biologically active form of the endogenous protein, depending on the 
integration site (e.g., upstream region versus intron 2). The activated gene may 
be a known gene (e.g., previously cloned or characterized) or unknown gene 
(previously not cloned or characterized). The function of the gene may be known 
or unknown. 

Examples of proteins with known activities include, but are not limited to, 
cytokines, growth factors, neurotransmitters, enzymes, structural proteins, cell 
surface receptors, intracellular receptors, hormones, antibodies, and transcription 
factors. Specific examples of known proteins that can be produced by this method 
include, but are not limited to, erythropoietin, insulin, growth hormone, 
glucocerebrosidase, tissue plasminogen activator, granulocyte-colony stimulating 
factor (G-CSF), granulocyte/macrophage colony stimulating factor (GM-CSF), 
macrophage colony-stimulating factor (M-CSF) interferon cc, interferon P, 
interferon y, interleukin-2, interleukin-3, interleukin-4, interleukin-6, interleukin-8, 
interleukin-10, interleukin-1 1, interleukin-12, interleukin-13, interleukin-14, 
TGF-P, blood clotting factor V, blood clotting factor VII, blood clotting factor 
VIII, blood clotting factor IX, blood clotting factor X, TSH-p, bone growth 
factor-2, bone growth factor-7, tumor necrosis factor, alpha- 1 antitrypsin, 
anti-thrombin III, leukemia inhibitory factor, glucagon, Protein C, protein 
kinase C, stem cell factor, follicle stimulating hormone P, urokinase, nerve growth 
factors, insulin-like growth factors, insulinotropin, parathyroid hormone, 
lactoferrin, complement inhibitors, platelet derived growth factor, keratinocyte 
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growth factor, hepatocyte growth factor, endothelial cell growth factor, 
neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha 
glucosidase, epidermal growth factor, and fibroblast growth factor. The invention 
also allows the activation of a variety of genes expressing transmembrane proteins, 
and production and isolation of such proteins, including but not limited to cell 
surface receptors for growth factors, hormones, neurotransmitters and cytokines 
such as those described above, transmembrane ion channels, cholesterol receptors, 
receptors for lipoproteins (including LDLs and HDLs) and other lipid moieties, 
integrins and other extracellular matrix receptors, cyto skeletal anchoring proteins, 
immunoglobulin receptors, CD antigens (including CD2, CD3, CD4, CD8, and 
CD34 antigens), and other cell surface transmembrane structural and functional 
proteins that are known in the art. As one of ordinary skill will appreciate, other 
cellular proteins and receptors that are known in the art may also be produced by 
the methods of the invention. 

One of the advantages of the method described herein is that virtually any 
gene can be activated. However, since genes have different genomic structures, 
including different intron/exon boundaries and locations of start codons, a variety 
of activation constructs is provided to activate the maximum number of different 
genes within a population of cells. 

These constructs can be transfected separately into cells to produce 
libraries. Each library contains cells with a unique set of activated genes. Some 
genes will be activated by several different activation constructs. In addition, 
portions of a gene can be activated to produce truncated, biologically active 
proteins. Truncated proteins can be produced, for example, by integration of an 
activation construct into introns or exons in the middle of an endogenous gene 
rather than upstream of the second exon. 

Use of different constructs also allows the activated gene to be modified 
to contain new sequences. For example, a secretion signal sequence can be 
included on the activation construct to facilitate the secretion of the activated 
gene. In some cases, depending on the intron/exon structure or the gene of 
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interest, the secretion signal sequence can replace all or part of the signal sequence 
of the endogenous gene. In other cases, the signal sequence will allow a protein 
which is normally located intracellularly to be secreted. 

The regulatory sequence on the vector can be a constitutive promoter. 
5 Alternatively, the promoter may be inducible. Use of inducible promoters will 

allow low basal levels of activated protein to be produced by the cell during 
routine culturing and expansion. The cells may then be induced to produce large 
amounts of the desired proteins, for example, during manufacturing or screening. 
Examples of inducible promoters include, but are not limited to, the tetracycline 
10 inducible promoter and the metallothionein promoter. 

In preferred embodiments of the invention, the regulatory sequence on the 
vectors of the invention may be a promoter, an enhancer, or a repressor, any of 
which may be tissue specific. 

The regulatory sequence on the vector can be isolated from cellular or viral 
15 genomes. Examples of cellular regulatory sequences include, but are not limited 

to, regulatory elements from the actin gene, metallothionein I gene, 
immunoglobulin genes, casein I gene, serum albumin gene, collagen gene, globin 
genes, laminingene, spectrin gene, ankyringene, sodium/potassium ATPase gene, 
and tubulin gene. Examples of viral regulatory sequences include, but are not 
20 limited to, regulatory elements from Cytomegalovirus (CMV) immediate early 

gene, adenovirus late genes, SV40 genes, retroviral LTRs, and Herpesvirus genes. 
Typically, regulatory sequences contain binding sites for transcription factors such 
as NF-kB, SP-1, TATA binding protein, AP-1, and CAAT binding protein. 
Functionally, the regulatory sequence is defined by its ability to promote, enhance, 
25 or otherwise alter transcription of an endogenous gene. 

In certain preferred embodiments, the regulatory sequence is a viral 
promoter. In particularly preferred embodiments, the promoter is the CMV 
immediate early gene promoter. In alternative embodiments, the regulatory 
element is a cellular, non-viral promoter. 
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In alternative preferred embodiments, the regulatory element may be or 
may contain an enhancer. In particularly preferred such embodiments, the 
enhancer is the cytomegalovirus immediate early gene enhancer. In alternative 
embodiments, the enhancer is a cellular, non-viral enhancer. 

In alternative preferred embodiments, the regulatory element may be or 
may contain a repressor. In particularly preferred such embodiments, the 
repressor may be a viral repressor or a cellular, non-viral repressor. 

The transcriptional regulatory sequence can also comprise one or more 
scaffold-attachment regions or matrix attachment sites, negative regulatory 
elements, and transcription factor binding sites. Regulatory sequences can also 
include locus control regions. 

The invention also encompasses the use of retrovirus transcriptional 
regulatory sequences, e.g., long terminal repeats. Where these are used, however, 
they are not necessarily linked to any retrovirus sequence that materially affects 
the function of the transcriptional regulatory sequence as a promoter or enhancer 
of transcription of the endogenous gene to be activated (i.e., the cellular gene with 
which the transcriptional regulatory sequence recombines to activate). 

The vector constructs of the invention may also comprise a regulatory 
sequence which is not operably linked to exonic sequences on the vector. For 
example, when the regulatory element is an enhancer, it can integrate near an 
endogenous gene (e.g., upstream, downstream, or in an intron) and stimulate 
expression of the gene from its endogenous promoter. By this mechanism of 
activation, exonic sequences from the vector are absent in the transcript of the 
activated gene. 

Alternatively, the regulatory element may be operably linked to an exon. 
The exon may be a naturally occurring sequence or may be non-naturally 
occurring (e.g., produced synthetically). To activate endogenous genes lacking 
a start codon in their first exon (e.g., follicle stimulating hormone-p), a start codon 
is preferably omitted from the exon on the vector. To activate endogenous genes 
containing a start codon in the first exon (e.g., erythropoietin and growth 
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hormone), the exon on the vector preferably contains a start codon, usually ATG 
and preferably an efficient translation initiation site (Kozak, J. Mol Biol. 196: 947 
(1987)). The exon may contain additional codons following the start codon. 
These codons may be derived from a naturally occurring gene or may be 
non-naturally occurring (e.g., synthetic). The codons may be the same as the 
codons present in the first exon of the endogenous gene to be activated. 
Alternatively, the codons may be different than the codons present in the first exon 
of the endogenous gene. For example, the codons may encode an epitope tag, 
signal secretion sequence, transmembrane domain, selectable marker, or 
screenable marker. Optionally, an unpaired splice donor site may be present 
immediately 3 ' of the exonic sequence. When the structure of the gene to be 
activated is known, the splice donor site should be placed adjacent to the vector 
exon in a location such that the codons in the vector will be in frame with the 
codons of the second exon of the endogenous gene following splicing. When the 
structure of the endogenous gene to be activated is not known, separate 
constructs, each containing a different reading frame, are used. 

Operably linked is defined as a configuration that allows transcription 
through the designated sequence(s). For example, a regulatory sequence that is 
operably linked to an exonic sequence indicates that the exonic sequence is 
transcribed. When a start codon is present on the vector, operably linked also 
indicates that the open reading frame from the vector exon is in frame with the 
open reading frame of the endogenous gene. Following nonhomologous 
integration, the regulatory sequence (e.g., a promoter) on the vector becomes 
operably linked to an endogenous gene and facilitates transcription initiation, at 
a site generally referred to as a CAP site. Transcription proceeds through the 
exonic elements on the vector (and, if present, through the start codon, open 
reading frame, and/or unpaired splice donor site), and through the endogenous 
gene. The primary transcript produced by this operable linkage is spliced to create 
a chimeric transcript containing exonic sequences from both the vector and the 
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endogenous gene. This transcript is capable of producing the endogenous protein 
when translated. 

An exon or "exonic sequence" is defined as any transcribed sequence that 
is present in the mature RNA molecule. The exon on the vector may contain 
untranslated sequences, for example, a 5' untranslated region. Alternatively, or 
in conjunction with the untranslated sequences, the exon may contain coding 
sequences such as a start codon and open reading frame. The open reading frame 
can encode naturally occurring amino acid sequences or non-naturally occurring 
amino acid sequences (e.g., synthetic codons). The open reading frame may also 
encode a signal secretion sequence, epitope tag, exon, selectable marker, 
screenable marker, or nucleotides that function to allow the open reading frame 
to be preserved when spliced to an endogenous gene. 

Splicing of primary transcripts, the process by which introns are removed, 
is directed by a splice donor site and a splice acceptor site, located at the 5 ' and 
3 ' ends of introns, respectively. The consensus sequence for splice donor sites is 
(A/C)AG GURAGU (where R represents a purine nucleotide) with nucleotides 
in positions 1-3 located in the exon and nucleotides GURAGU located in the 
intron. 

An unpaired splice donor site is defined herein as a splice donor site 
present on the activation construct without a downstream splice acceptor site. 
When the vector is integrated by nonhomologous recombination into a host cell's 
genome, the unpaired splice donor site becomes paired with a splice acceptor site 
from an endogenous gene. The splice donor site from the vector, in conjunction 
with the splice acceptor site from the endogenous gene, will then direct the 
excision of all of the sequences between the vector splice donor site and the 
endogenous splice acceptor site. Excision of these intervening sequences removes 
sequences that interfere with translation of the endogenous protein. 

The terms upstream and downstream, as used herein, are intended to mean 
in the 5' or in the 3' direction, respectively, relative to the coding strand. The 
term "upstream region" of a gene is defined as the nucleic acid sequence 5' of its 
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second exon (relative to the coding strand) up to and including the last exon of the 
first adjacent gene having the same coding strand. Functionally, the upstream 
region is any site 5' of the second exon of an endogenous gene capable of allowing 
a nonhomologously integrated vector to become operably linked to the 
5 endogenous gene. 

The vector construct can contain a selectabLe marker to facilitate the 
identification and isolation of cells containing a nonhomologously integrated 
activation construct. Examples of selectable markers include genes encoding 
neomycin resistance (neo), hypoxanthine phosphoribosyl transferase (HPRT), 

1 0 puromycin (pac), dihydro-orotase glutamine synthetase (GS), histidine D (his D), 

carbamyl phosphate synthase (CAD), dihyrofolate reductase (DHFR), multidrug 
resistance 1 (mdrl), aspartate transcarbamylase, xanthine-guanine phosphoribosyl 
transferase (gpt), and adenosine deaminase (ada). 

Alternatively, the vector can contain a screenable marker, in place of or in 

1 5 addition to, the selectable marker. A screenable marker allows the cells containing 

the vector to be isolated without placing them under drug or other selective 
pressures. Examples of screenable markers include genes encoding cell surface 
proteins, fluorescent proteins, and enzymes. The vector containing cells may be 
isolated, for example, by FACS using fluorescently-tagged antibodies to the cell 

20 surface protein or substrates that can be converted to fluorescent products by a 

vector encoded enzyme. 

Alternatively, selection can be effected by phenotypic selection for a trait 
provided by the endogenous gene product. The activation construct, therefore, 
can lack a selectable marker other than the "marker" provided by the endogenous 

25 gene itself. In this embodiment, activated cells can be selected based on a 

phenotype conferred by the activated gene. Examples of selectable phenotypes 
include cellular proliferation, growth factor independent growth, colony 
formation, cellular differentiation (e.g., differentiation into a neuronal cell, muscle 
cell, epithelial cell, etc.), anchorage independent growth, activation of cellular 

30 factors (e.g., kinases, transcription factors, nucleases, etc.), expression of cell 
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surface receptors/proteins, gain or loss of cell-cell adhesion, migration, and 
cellular activation (e.g., resting versus activated T cells). 

A selectable marker may also be omitted from the construct when 
transfected cells are screened for gene activation products without selecting for 
5 the stable integrants. This is particularly useful when the efficiency of stable 

integration is high. 

The vector may contain one or more (i.e., one, two, three, four, five, or 
more, and most preferably one or two) amplifiable markers to allow for selection 
of cells containing increased copies of the integrated vector and the adjacent 
1 0 activated endogenous gene. Examples of amplifiable markers include but are not 

limited to dihydrofolate reductase (DHFR), adenosine deaminase (ada), 
dihydro-orotase glutamine synthetase (GS), and carbamyl phosphate synthase 
(CAD). 

The vector may contain eukaryotic viral origins of replication useful for 
1 5 gene amplification. These origins may be present in place of, or in conjunction 

with, an amplifiable marker. 

The vector may also contain genetic elements useful for the propagation 
of the construct in micro-organisms. Examples of useful genetic elements include 
microbial origins of replication and antibiotic resistance markers. 
20 These vectors, and any of the vectors disclosed herein, and obvious 

variants recognized by one of ordinary skill in the art, can be used in any of the 
methods described herein to form any of the compositions producible by those 
methods. 

Nonhomologous integration of the construct into the genome of a cell 
25 results in the operable linkage between the regulatory elements from the vector 

and the exons from an endogenous gene. In preferred embodiments, the insertion 
of the vector regulatory sequences is used to upregulate expression of the 
endogenous gene. Upregulation of gene expression includes converting a 
transcriptionally silent gene to a transcriptionally active gene. It also includes 
30 enhancement of gene expression for genes that are already transcriptionally active, 



but produce protein at levels lower than desired. In other embodiments, 
expression of the endogenous gene may be affected in other ways such as 
downregulation of expression, creation of an inducible phenotype, or changing the 
tissue specificity of expression. 

According to the invention, in vitro methods of production of a gene 
expression product may comprise, for example, (a) introducing a vector of the 
invention into a cell; (b) allowing the vector to integrate into the genome of the 
cell by non-homologous recombination; (c) allowing over-expression of an 
endogenous gene in the cell by upregulation of the gene by the transcriptional 
regulatory sequence contained on the vector; (d) screening the cell for 
over-expression of the endogenous gene; and (e) culturing the cell under 
conditions favoring the production of the expression product of the endogenous 
gene by the cell. Such in vitro methods of the invention may further comprise 
isolating the expression product to produce an isolated gene expression product. 
In such methods, any art-known method of protein isolation may be 
advantageously used, including but not limited to chromatography (e.g., HPLC, 
FPLC, LC, ion exchange, affinity, size exclusion, and the like), precipitation (e.g., 
ammonium sulfate precipitation, immunoprecipitation, and the like), 
electrophoresis, and other methods of protein isolation and purification that will 
be familiar to one of ordinary skill in the art. 

Analogously, in vivo methods of production of a gene expression product 
may comprise, for example, (a) introducing a vector of the invention into a cell; 
(b) allowing the vector to integrate into the genome of the cell by 
non-homologous recombination; (c) allowing over-expression of an endogenous 
gene in the cell by upregulation of the gene by the transcriptional regulatory 
sequence contained on the vector; (d) screening the cell for over-expression of the 
endogenous gene; and (e) introducing the isolated and cloned cell into a eukaryote 
under conditions favoring the overexpression of the endogenous gene by the cell 
in vivo in the eukaryote. According to this aspect of the invention, any eukaryote 
may be advantageously used, including fungi (particularly yeasts), plants, and 
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animals, more preferably animals, still more preferably vertebrates, and most 
preferably mammals, particularly humans. In certain related embodiments, the 
invention provides such methods which further comprise isolating and cloning the 
cell prior to introducing it into the eukaryote. 

As used herein the phrases "conditions favoring the production" of an 
expression product, "conditions favoring the overexpression" of a gene, and 
"conditions favoring the activation" of a gene, in a cell or by a cell in vitro refer 
to any and all suitable environmental, physical, nutritional or biochemical 
parameters that allow, facilitate, or promote production of an expression product, 
or overexpression or activation of a gene, by a cell in vitro. Such conditions may, 
of course, include the use of culture media, incubation, lighting, humidity, etc., 
that are optimal or that allow, facilitate, or promote production of an expression 
product, or overexpression or activation of a gene, by a cell in vitro. Analogously, 
as used herein the phrases "conditions favoring the production" of an expression 
product, "conditions favoring the overexpression" of a gene, and "conditions 
favoring the activation" of a gene, in a cell or by a cell in vivo refer to any and all 
suitable environmental, physical, nutritional, biochemical, behavioral, genetic, and 
emotional parameters under which an animal containing a cell is maintained, that 
allow, facilitate, or promote production of an expression product, or 
overexpression or activation of a gene, by a cell in a eukaryote in vivo. Whether 
a given set of conditions are favorable for gene expression, activation, or 
overexpression, in vitro or in vivo, may be determined by one of ordinary skill 
using the screening methods described and exemplified below, or other methods 
for measuring gene expression, activation, or overexpression that are routine in 
the art. 

As used herein, the phrase "activating an endogenous gene" means 
inducing the production of a transcript encoding the endogenous gene at levels 
higher than those normally found in the cell containing the endogenous gene. In 
some applications, "activating an endogenous gene" may also mean producing the 
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protein, or a portion of the protein, encoded by the endogenous gene at levels 
higher than those normally found in the cell containing the endogenous gene. 

The invention also encompasses cells made by any of the above methods. 
The invention encompasses cells containing the vector constructs, cells in which 
the vector constructs have integrated, and cells which are over-expressing desired 
gene products from an endogenous gene, over-expression being driven by the 
introduced transcriptional regulatory sequence. 

Cells used in this invention can be derived from any eukaryotic species and 
can be primary, secondary, or immortalized. Furthermore, the cells can be derived 
from any tissue in the organism. Examples of useful tissues from which cells can 
be isolated and activated include, but are not limited to, liver, kidney, spleen, bone 
marrow, thymus, heart, muscle, lung, brain, testes, ovary, islet, intestinal, bone 
marrow, skin, bone, gall bladder, prostate, bladder, embryos, and the immune and 
hematopoietic systems. Cell types include fibroblast, epithelial, neuronal, stem, 
and follicular. However, any cell or cell type can be used to activate gene 
expression using this invention. 

The methods can be carried out in any cell of eukaryotic origin, such as 
fungal, plant or animal. Preferred embodiments include vertebrates and 
particularly mammals, and more particularly, humans. 

The construct can be integrated into primary, secondary, or immortalized 
cells. Primary cells are cells that have been isolated from a vertebrate and have 
not been passaged. Secondary cells are primary cells that have been passaged, but 
are not immortalized. Immortalized cells are cell lines that can be passaged, 
apparently indefinitely. 

In preferred embodiments, the cells are immortalized cell lines. Examples 
of immortalized cell lines include, but are not limited to, HT1080, HeLa, Jurkat, 
293 cells, KB carcinoma, T84 colonic epithelial cell line, Raji, Hep G2 or Hep 3B 
hepatoma cell lines, A2058 melanoma, U937 lymphoma, and WI38 fibroblast cell 
line, somatic cell hybrids, and hybridomas. 
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Cells used in this invention can be derived from any eukaryotic species, 
including but not limited to mammalian cells (such as rat, mouse, bovine, porcine, 
sheep, goat, and human), avian cells, fish cells, amphibian cells, reptilian cells, 
plant cells, and yeast cells. Preferably, over expression of an endogenous gene or 
gene product from a particular species is accomplished by activating gene 
expression in a cell from that species. For example, to overexpress endogenous 
human proteins, human cells are used. Similarly, to overexpress endogenous 
bovine proteins, for example bovine growth hormone, bovine cells are used. 

The cells can be derived from any tissue in the eukaryotic organism. 
Examples of useful vertebrate tissues from which cells can be isolated and 
activated include, but are not limited to, liver, kidney, spleen, bone marrow, 
thymus, heart, muscle, lung, brain, immune system (including lymphatic), testes, 
ovary, islet, intestinal, stomach, bone marrow, skin, bone, gall bladder, prostate, 
bladder, zygotes, embryos, and hematopoietic tissue. Useful vertebrate cell types 
include, but are not limited to, fibroblasts, epithelial cells, neuronal cells, germ 
cells (i.e. , spermatocytes/spermatozoaand oocytes), stem cells, and follicular cells. 
Examples of plant tissues from which cells can be isolated and activated include, 
but are not limited to, leaf tissue, ovary tissue, stamen tissue, pistil tissue, root 
tissue, tubers, gametes, seeds, embryos, and the like. One of ordinary skill will 
appreciate, however, that any eukaryotic cell or cell type can be used to activate 
gene expression using the present invention. 

Any of the cells produced by any of the methods described are useful for 
screening for expression of a desired gene product and for providing desired 
amounts of a gene product that is over-expressed in the cell. The cells can be 
isolated and cloned. 

Cells produced by this method can be used to produce protein in vitro 
(e.g., for use as a protein therapeutic) or in vivo (e.g., for use in cell therapy). 

Commercial growth and production conditions often vary from the 
conditions used to grow and prepare cells for analytical use (e.g., cloning, protein 
or nucleic acid sequencing, raising antibodies, X-ray crystallography analysis, 
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enzymatic analysis, and the like). Scale up of cells for growth in roller bottles 
involves increase in the surface area on which cells can attach. Microcarrier beads 
are, therefore, often added to increase the surface area for commercial growth. 
Scale up of cells in spinner culture may involve large increases in volume. Five 
liters or greater can be required for both microcarrier and spinner growth. 
Depending on the inherent potency (specific activity) of the protein of interest, the 
volume can be as low as 1-10 liters. 10-15 liters is more common. However, up 
to 50-100 liters may be necessary and volume can be as high as 10,000-15,000 
liters. In some cases, higher volumes may be required. Cells can also be grown 
in large numbers of T flasks, for example 50-100. 

Despite growth conditions, protein purification on a commercial scale can 
also vary considerably from purification for analytic purposes. Protein purification 
in a commercial practical context can be initially the mass equivalent of 1 0 liters 
of cells at approximately 10 4 cells/ml. Cell mass equivalent to begin protein 
purification can also be as high as 10 liters of cells at up to 10 6 or 10 7 cells/ml. As 
one of ordinary skill will appreciate, however, a higher or lower initial cell mass 
equivalent may also be advantageously used in the present methods. 

Another commercial growth condition, especially when the ultimate 
product is used clinically, is cell growth in serum-free medium, by which is 
intended medium containing no serum or not in amounts that are required for cell 
growth. This obviously avoids the undesired co-purification of toxic contaminants 
(e.g., viruses) or other types of contaminants, for example, proteins that would 
complicate purification. Serum-free media for growth of cells, commercial 
sources for such media, and methods for cultivation of cells in serum-free media, 
are well-known to those of ordinary skill in the art. 

A single cell made by the methods described above can over-express a 
single gene or more than one gene. More than one gene can be activated by the 
integration of a single construct or by the integration of multiple constructs in the 
same cell (i.e., more than one type of construct). Therefore, a cell can contain 
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only one type of vector construct or different types of constructs, each capable of 
activating an endogenous gene. 

The invention is also directed to methods for making the cells described 
above by one or more of the following: introducing one or more of the vector 
constructs; allowing the introduced constructs) to integrate into the genome of 
the cell by non-homologous recombination; allowing over-expression of one or 
more endogenous genes in the cell; and isolating and cloning the cell. 

The term "transfection" has been used herein for convenience when 
discussing introducing a polynucleotide into a cell. However, it is to be 
understood that the specific use of this term has been applied to generally refer to 
the introduction of the polynucleotide into a cell and is also intended to refer to 
the introduction by other methods described herein such as electroporation, 
liposome-mediated introduction, retrovirus-mediated introduction, and the like (as 
well as according to its own specific meaning). 

The vector can be introduced into the cell by a number of methods known 
in the art. These include, but are not limited to, electroporation, calcium 
phosphate precipitation, DEAE dextran, lipofection, and receptor mediated 
endocytosis, polybrene, particle bombardment, and microinjection. Alternatively, 
the vector can be delivered to the cell as a viral particle (either replication 
competent or deficient). Examples of viruses useful for the delivery of nucleic 
acid include, but are not limited to, adenoviruses, adeno-associated viruses, 
retroviruses, Herpes viruseses, and vaccinia viruses. Other viruses suitable for 
delivery of nucleic acid molecules into cells that are known to one of ordinary skill 
may be equivalently used in the present methods. 

Following transfection, the cells are cultured under conditions, as known 
in the art, suitable for nonhomologous integration between the vector and the host 
cell's genome. Cells containing the nonhomologously integrated vector can be 
further cultured under conditions, as known in the art, allowing expression of 
activated endogenous genes. 
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The vector construct can be introduced into cells on a single DNA 
construct or on separate constructs and allowed to concatemerize. 

Whereas in preferred embodiments, the vector construct is a double- 
stranded DNA vector construct, vector constructs also include single-stranded 
DNA, combinations of single- and double-stranded DNA, single-stranded RNA, 
double-stranded RNA, and combinations of single- and double-stranded RNA. 
Thus, for example, the vector construct could be single-stranded RNA which is 
converted to cDNA by reverse transcriptase, the cDNA converted to double- 
stranded DNA, and the double-stranded DNA ultimately recombining with the 
host cell genome. 

In preferred embodiments, the constructs are linearized prior to 
introduction into the cell. Linearization of the activation construct creates free 
DNA ends capable of reacting with chromosomal ends during the integration 
process. In general, the construct is linearized downstream of the regulatory 
element (and exonic and splice donor sequences, if present). Linearization can be 
facilitated by, for example, placing a unique restriction site downstream of the 
regulatory sequences and treating the construct with the corresponding restriction 
enzyme prior to transfection. While not required, it is advantageous to place a 
"spacer" sequence between the linearization site and the proximal most functional 
element (e.g., the unpaired splice donor site) on the construct. When present, the 
spacer sequence protects the important functional elements on the vector from 
exonucleolytic degradation during the transfection process. The spacer can be 
composed of any nucleotide sequence that does not change the essential functions 
of the vector as described herein. 

Circular constructs can also be used to activate endogenous gene 
expression. It is known in the art that circular plasmids, upon transfection into 
cells, can integrate into the host cell genome. Presumably, DNA breaks occur in 
the circular plasmid during the transfection process, thereby generating free DNA 
ends capable of joining to chromosome ends. Some of these breaks in the 
construct will occur in a location that does not destroy essential vector functions 
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(e.g., the break will occur downstream of the regulatory sequence), and therefore, 
will allow the construct to be integrated into a chromosome in a configuration 
capable of activating an endogenous gene. As described above, spacer sequences 
may be placed on the construct (e.g., downstream of the regulatory sequences). 
5 During transfection, breaks that occur in the spacer region will create free ends at 

a site in the construct suitable for activation of an endogenous gene following 
integration into the host cell genome. 

The invention also encompasses libraries of cells made by the above 
described methods. A library can encompass all of the clones from a single 

10 transfection experiment or a subset of clones from a single transfection 

experiment. The subset can over-express the same gene or more than one gene, 
for example, a class of genes. The transfection can have been done with a single 
type of construct or with more than one type of construct. 

A library can also be formed by combining all of the recombinant cells 

1 5 from two or more transfection experiments, by combining one or more subsets of 

cells from a single transfection experiment or by combining subsets of cells from 
separate transfection experiments. The resulting library can express the same 
gene, or more than one gene, for example, a class of genes. Again, in each of 
these individual transfections, a unique construct or more than one construct can 

20 be used. 

Libraries can be formed from the same cell type or different cell types. 
The library can be composed of a single type of cell containing a single 
type of activation construct which has been integrated into chromosomes at 
spontaneous DNA breaks or at breaks generated by radiation, restriction enzymes, 

25 and/or DNA breaking agents, applied either together (to the same cells) or 

separately (applied to individual groups of cells and then combining the cells 
together to produce the library). The library can be composed of multiple types 
of cells containing a single or multiple constructs which were integrated into the 
genome of a cell treated with radiation, restriction enzymes, and/or DNA breaking 

30 agents, applied either together (to the same cells) or separately (applied to 
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individual groups of cells and then combining the cells together to produce the 
library). 

The invention is also directed to methods for making libraries by selecting 
various subsets of cells from the same or different transfection experiments. For 
example, all of the cells expressing nuclear factors (as determined by the presence 
of nuclear green fluorescent protein in cells transfected with construct 20) can be 
pooled to create a library of cells with activated nuclear factors. Similarly, cells 
expressing membrane or secreted proteins can be pooled. Cells can also be 
grouped by phenotype, for example, growth factor independent growth, growth 
factor independent proliferation, colony formation, cellular differentiation (e.g., 
differentiation into a neuronal cell, muscle cell, epithelial cell, etc.), anchorage 
independent growth, activation of cellular factors (e.g., kinases, transcription 
factors, nucleases, etc.), gain or loss of cell-cell adhesion, migration, or cellular 
activation (e.g., resting versus activated T cells). 

The invention is also directed to methods of using libraries of ceils to 
over-express an endogenous gene. The library is screened for the expression of 
the gene and cells are selected that express the desired gene product. The cell can 
then be used to purify the gene product for subsequent use. Expression of the cell 
can occur by culturing the cell in vitro or by allowing the cell to express the gene 
in vivo. 

The invention is also directed to methods of using libraries to identify 
novel gene and gene products. 

The invention is also directed to methods for increasing the efficiency of 
gene activation by treating the cells with agents that stimulate or effect the 
patterns of non-homologous integration. It has been demonstrated that gene 
expression patterns, chromatin structure, and methylation patterns can differ 
dramatically from cell type to cell type. Even different cell lines from the same cell 
type can have significant differences. These differences can impact the patterns 
of non-homologous integration by affecting both the DNA breakage pattern and 
the repair process. For example, chromatinized stretches of DNA (characteristics 
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likely associated with inactive genes) may be more resistant to breakage by 
restriction enzymes and chemical agents, whereas they may be susceptible to 
breakage by radiation. 

Furthermore, inactive genes can be methylated. In this case, restriction 
enzymes that are blocked by CpG methylation will be unable to cleave methylated 
sites near the inactive gene, making it more difficult to activate that gene using 
methylation-sensitive enzymes. These problems can be circumvented by creating 
activation libraries in several cell lines using a variety of DNA breakage agents. 
By doing this, a more complete integration pattern can be created and the 
probability of activating a given gene maximized. 

The methods of the invention can include introducing double strand breaks 
into the DNA of the cell containing the endogenous gene to be over-expressed. 
These methods introduce double-strand breaks into the genomic DNA in the cell 
prior to or simultaneously with vector integration. The mechanism of DNA 
breakage can have a significant effect on the pattern of DNA breaks in the 
genome. As a result, DNA breaks produced spontaneously or artificially with 
radiation, restriction enzymes, bleomycin, or other breaking agents, can occur in 
different locations. 

In order to increase integration efficiency and to improve the random 
distribution of integration sites, cells can be treated with low, intermediate, or high 
doses of radiation prior to or following transfection. By artificially inducing 
double strand breaks, the transfected DNA can now integrate into the host cell 
chromosome as part of the DNA repair process. Normally, creation of double 
strand breaks to serve as the site of integration is the rate limiting step. Thus, by 
increasing chromosome breaks using radiation (or other DNA damaging agents), 
a larger number of integrants can be obtained in a given transfection. 
Furthermore, the mechanism of DNA breakage by radiation is different than by 
spontaneous breakage. 

Radiation can induce DNA breaks directly when a high energy photon hits 
the DNA molecule. Alternatively, radiation can activate compounds in the cell 
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which in turn, react with and break the DNA strand. Spontaneous breaks, on the 
other hand, are thought to occur by the interaction between reactive compounds 
produced in the cell (such as superoxides and peroxides) and the DNA molecule. 
However, DNA in the cell is not present as a naked, deproteinized polymer, but 
5 instead is bound to chromatin and present in a condensed state. As a result, some 

regions are not accessible to agents in the cell that cause double strand breaks. 
The photons produced by radiation have wave lengths short enough to hit highly 
condensed regions of DNA, thereby inducing breaks in DNA regions that are 
under represented in spontaneous breaks. Thus, radiation is capable of creating 

10 different DNA breakage patterns, which in turn, should lead to different 

integration patterns. 

As a result, libraries produced using the same activation construct in cells 
with and without radiation treatment will potentially contain different sets of 
activated genes. Finally, radiation treatment increases efficiency of 

1 5 nonhomologous integration by up to 5- 1 0 fold, allowing complete libraries to be 

created using fewer cells. Thus, radiation treatment increases the efficiency of 
gene activation and generates new integration and activation patterns in 
transfected cells. Useful types of radiation include a, p, y, x-ray, and ultraviolet 
radiation. Useful doses of radiation vary for different cell types, but in general, 

20 dose ranges resulting in cell viabilities of 0 . 1 % to >99% are useful. For FIT 1080 

cells, this corresponds to radiation doses from a 137 Cs source of approximately 
0. 1 rads to 1000 rads. Other doses may also be useful as long as the dose either 
increases the integration frequency or changes the pattern of integration sites. 

In addition to radiation, restriction enzymes can be used to artificially 

25 induce chromosome breaks in transfected cells. As with radiation, DNA 

restriction enzymes can create chromosome breaks which, in turn, serve as 
integration sites for the transfected DNA. This larger number of DNA breaks 
increases the overall efficiency of integration of the activation construct. 
Furthermore, the mechanism of breakage by restriction enzymes differs from that 

30 by radiation, the pattern of chromosome breaks is also likely to be different. 
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Restriction enzymes are relatively large molecules compared to photons 
and small metabolites capable of damaging DNA. As a result, restriction enzymes 
will tend to break regions that are less condensed then the genome as a whole. If 
the gene of interest lies within an accessible region of the genome, then treatment 
of the cells with a restriction enzyme can increase the probability of integrating the 
activation construct upstream of the gene of interest. Since restriction enzymes 
recognize specific sequences, and since a given restriction site may not lie 
upstream of the gene of interest, a variety of restriction enzymes can be used. It 
may also be important to use a variety of restriction enzymes since each enzyme 
has different properties (e.g., size, stability, ability to cleave methylated sites, and 
optimal reaction conditions) that affect which sites in the host chromosome will 
be cleaved. Each enzyme, due to the different distribution of cleavable restriction 
sites, will create a different integration pattern. 

Therefore, introduction of restriction enzymes (or plasmids capable of 
expressing restriction enzymes) before, during, or after introduction of the 
activation construct will result in the activation of different sets of genes. Finally, 
restriction enzyme-induced breaks increase the integration efficiency by up to 5- 1 0 
fold (Yorifuji et a!., Mut. Res. 243:121 (1990)), allowing fewer cells to be 
transfected to produce a complete library. Thus, restriction enzymes can be used 
to create new integration patterns, allowing activation of genes which failed to be 
activated in libraries produced by non-homologous recombination at spontaneous 
breaks or at other artificially induced breaks. 

Restriction enzymes can also be used to bias integration of the activation 
construct to a desired site in the genome. For example, several rare restriction 
enzymes have been described which cleave eukaryotic DNA every 50-1000 
kilobases, on average. If a rare restriction recognition sequence happens to be 
located upstream of a gene of interest, by introducing the restriction enzyme at the 
time of transfection along with the activation construct, DNA breaks can be 
preferentially upstream of the gene of interest. These breaks can then serve as 
sites for integration of the activation construct. Any enzyme can be that cleaves 
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in an appropriate location in or near the gene of interest and its site is 
under-represented in the rest of the genome or its site is over-represented near 
genes (e.g., restriction sites containing CpG). For genes that have not been 
previously identified, restriction enzymes with 8 bp recognition sites (e.g., Noil, 
5 Sfil Pmel Swal Ssel Srfl SgrAl, Pad, Ascl, Sgfl, and Sse8387I), enzymes 

recognizing CpG containing sites (e.g., Eagl, Bsi-WI, MluL, andBssHIl) and other 
rare cutting enzymes can be used. 

In this way, "biased" libraries can be created which are enriched for certain 
types of activated genes. In this respect, restriction enzyme sites containing CpG 
10 dinucleotides are particularly useful since these sites are under-represented in the 

genome at large, but over-represented in the form of CpG islands at the 5' end of 
many genes, the very location that is useful for gene activation. Enzymes 
recognizing these sites, therefore, will preferentially cleave at the 5' end of genie 
sequences. 

15 Restriction enzymes can be introduced into the host cell by several 

methods. First, restriction enzymes can be introduced into the cell by 
electroporation (Yorifuji et al, Mut. Res. 243: 121 (1990); Winegar et al, Mut. 
Res. 225:49 (1989)). In general, the amount of restriction enzyme introduced into 
the cell is proportional to its concentration in the electroporation media. The 

20 pulse conditions must be optimized for each cell line by adjusting the voltage, 

capacitance, and resistance. Second, the restriction enzyme can be expressed 
transiently from a plasmid encoding the enzyme under the control of eukaryotic 
regulatory elements. The level of enzyme produced can be controlled by using 
inducible promoters, and varying the strength of induction. In some cases, it may 

25 be desirable to limit the amount of restriction enzyme produced (due to its 

toxicity). In these cases, weak or mutant promoters, splice sites, translation start 
codons, and poly(A) tails can be utilized to lower the amount of restriction 
enzyme produced. Third, restriction enzymes can be introduced by agents that 
fuse with or permeabilize the cell membrane. Liposomes and streptolysin O 

30 (Pimplikar et al, J. Cell Biol. 725:1025 (1994)) are examples of this type of 
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agent. Finally, mechanical perforation (Beckers etal, Cell 50:523-534 (1987)) 
and microinjection can also be used to introduce nucleases and other proteins into 
cells. However, any method capable of delivering active enzymes to a living cell 
is suitable. 

5 DNA breaks induced by bleomycin and other DNA damaging agents can 

also produce DNA breakage patterns that are different. Thus, any agent or 
incubation condition capable of generating double strand breaks in cells is useful 
for increasing the efficiency and/or altering the sites of non-homologous 
recombination. Examples of classes of chemical DNA breaking agents include, 

10 but are not limited to, peroxides and other free radical generating compounds, 

alkylating agents, topoisomerase inhibitors, anti-neoplastic drugs, acids, 
substituted nucleotides, and enediyne antibiotics. 

Specific chemical DNA breaking agents include, but are not limited to, 
bleomycin, hydrogen peroxide, cumene hydroperoxide, tert-butyl hydroperoxide, 

15 hypochlorous acid (reacted with aniline, 1-naphthylamine or 1-naphthol), nitric 

acid, phosphoric acid, doxorubicin, 9-deoxydoxorubicin, demethyl-6-deoxyrubicin, 
5-iminodaunorubicin, adriamycin, 4'-(9- acridinylamino)methanesulfon- 
m-anisidide, neocarzinostatin, 8 -methoxy caffeine, etoposide, ellipticine, 
iododeoxyuridine, and bromodeoxyuridine. 

20 It has been shown that DNA repair machinery in the cell can be induced 

by pre-exposing the cell to low doses of a DNA breaking agent such as radiation 
or bleomycin. By pretreating cells with these agents approximately 24 hours prior 
to transfection, the cell will be more efficient at repairing DNA breaks and 
integrating DNA following transfection. In addition, higher doses of radiation or 

25 other DNA breaking agents can be used since the LD50 (the dose that results in 

lethality in 50% of the exposed cells) is higher following pretreatment. This 
allows random activation libraries to be created at multiple doses and results in a 
different distribution of integration sites within the host cell's chromosomes. 
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Screening 

Once an activation library (or libraries) is created, it can be screened using 
a number of assays. Depending on the characteristics of the protein(s) of interest 
(e.g., secreted versus intracellular proteins) and the nature of the activation 

5 construct used to create the library, any or all of the assays described below can 

be utilized. Other assay formats can also be used. 

ELISA. Activated proteins can be detected using the enzyme-linked 
immunosorbent assay (ELISA), If the activated gene product is secreted, culture 
supernatants from pools of activation library cells are incubated in wells containing 

10 bound antibody specific for the protein of interest. If a cell or group of cells has 

activated the gene of interest, then the protein will be secreted into the culture 
media. By screening pools of library clones (the pools can be from 1 to greater 
than 100,000 library members), pools containing a cell(s) that has activated the 
gene of interest can be identified. The cell of interest can then be purified away 

15 from the other library members by sib selection, limiting dilution, or other 

techniques known in the art. In addition to secreted proteins, ELISA can be used 
to screen for cells expressing intracellular and membrane-bound proteins. In these 
cases, instead of screening culture supernatants, a small number of cells is 
removed from the library pool (each cell is represented at least 100-1000 times in 

20 each pool), lysed, clarified, and added to the antibody-coated wells. 

ELISA Spot Assay. ELISA spot are coated with antibodies specific for 
the protein of interest. Following coating, the wells are blocked with 1% 
BSA/PBS for 1 hourat37°C. Following blocking, 100,000 to 500,000 cells from 
the random activation library are applied to each well (representing -10% of the 

25 total pool). In general, one pool is applied to each well. If the frequency of a cell 

expressing the protein of interest is 1 in 10,000 (i.e., the pool consists of 10,000 
individual clones, one of which expresses the protein of interest), then plating 
500,000 cells per well will yield 50 specific cells. Cells are incubated in the wells 
at 37°C for 24 to 48 hours without being moved or disturbed. At the end of the 

30 incubation, the cells are removed and the plate is washed 3 times with PBS/0.05% 
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Tween 20 and 3 times with PBS/l%BSA. Secondary antibodies are applied to the 
wells at the appropriate concentration and incubated for 2 hours at room 
temperature or 16 hours at 4°C. These antibodies can be biotinylated or labeled 
directly with horseradish peroxidase (HRP). The secondary antibodies are 
5 removed and the plate is washed with PBS/1% BS A. The tertiary antibody or 

streptavidin labeled with HRP is added and incubated for 1 hour at room 
temperature. 

FACS assay. The fluorescence-activated cell sorter (FACS) can be used 
to screen the random activation library in a number of ways. If the gene of interest 

1 0 encodes a cell surface protein, then fluorescently-labeled antibodies are incubated 

with cells from the activation library. If the gene of interest encodes a secreted 
protein, then cells can be biotinylated and incubated with streptavidin conjugated 
to an antibody specific to the protein of interest (Manz et al, Proc. Natl. Acad. 
ScL (USA) 92:1921 (1 995)). Following incubation, the cells are placed in a high 

1 5 concentration of gelatin (or other polymer such as agarose or methylcellulose) to 

limit diffusion of the secreted protein. As protein is secreted by the cell, it is 
captured by the antibody bound to the cell surface. The presence of the protein 
of interest is then detected by a second antibody which is fluorescently labeled. 
For both secreted and membrane bound proteins, the cells can then be sorted 

20 according to their fluorescence signal. Fluorescent cells can then be isolated, 

expanded, and further enriched by FACS, limiting dilution, or other cell 
purification techniques known in the art. 

Magnetic Bead Separation. The principle of this technique is similar to 
FACS. Membrane bound proteins and captured secreted proteins (as described 

25 above) are detected by incubating the activation library with an 

antibody-conjugated magnetic beads that are specific for the protein of interest. 
If the protein is present on the surface of a cell, the magnetic beads will bind to 
that cell. Using a magnet, the cells expressing the protein of interest can be 
purified away from the other cells in the library. The cells are then released from 

30 the beads, expanded, analyzed, and further purified if necessary. 
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RT-PCK A small number of cells (equivalent to at least the number of 
individual clones in the pool) is harvested and lysed to allow purification of the 
RNA. Following isolation, the RNA is reversed-transcribed using reverse 
transcriptase. PCR is then carried out using primers specific for the cDNA of the 
gene of interest. 

Alternatively, primers can be used that span the synthetic exon in the 
activation construct and the exon of the endogenous gene. This primer will not 
hybridize to and amplify the endogenously expressed gene of interest. Conversely, 
if the activation construct has integrated upstream of the gene of interest and 
activated gene expression, then this primer, in conjunction with a second primer 
specific for the gene will amplify the activated gene by virtue of the presence of 
the synthetic exon spliced onto the exon from the endogenous gene. Thus, this 
method can be used to detect activated genes in cells that normally express the 
gene of interest at lower than desired levels. 

Phenotypic Section. In this embodiment, cells can be selected based on 
a phenotype conferred by the activated gene. Examples of phenotypes that can 
be selected for include proliferation, growth factor independent growth, colony 
formation, cellular differentiation (e.g., differentiation into a neuronal cell, muscle 
cell, epithelial cell, etc.), anchorage independent growth, activation of cellular 
factors (e.g., kinases, transcription factors, nucleases, etc.), gain or loss of cell-cell 
adhesion, migration, and cellular activation (e.g., resting versus activated T cells). 
Isolation of activated cells demonstrating a phenotype, such as those described 
above, is important because the activation of an endogenous gene by the 
integrated construct is presumably responsible for the observed cellular phenotype. 
Thus, the activated gene may be an important therapeutic drug or drug target for 
treating or inducing the observed phenotype. 

The sensitivity of each of the above assays can be effectively increased by 
transiently upregulating gene expression in the library cells. This can be 
accomplished for NF-kB site-containing promoters (on the activation construct) 
by adding PMA and tumor necrosis factor-a, e.g., to the library. Separately, or 
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in conjunction with PMA and TNF-cc, sodium butyrate can be added to further 
enhance gene expression. Addition of these reagents can increase expression of 
the protein of interest, thereby allowing a lower sensitivity assay to be used to 
identify the gene activated cell of interest. 

Since large activation libraries are created to maximize activation of many 
genes, it is advantageous to organize the library clones in pools. Each pool can 
consist of 1 to greater than 1 00,000 individual clones. Thus, in a given pool, many 
activated proteins are produced, often in dilute concentrations (due to the overall 
size of the pool and the limited number of cells within the pool that produce a 
given activated protein). Thus, concentration of the proteins prior to screening 
effectively increases the ability to detect the activated proteins in the screening 
assay. One particularly useful method of concentration is ultrafiltration; however, 
other methods can also be used. For example, proteins can be concentrated 
non-specifically, or semi-specifically by adsorption onto ion exchange, 
hydrophobic, dye, hydroxyapatite, lectin, and other suitable resins under 
conditions that bind most or all proteins present. The bound proteins can then be 
removed in a small volume prior to screening. It is advantageous to grow the cells 
in serum free media to facilitate the concentration of proteins. 

In another embodiment, a useful sequence that can be included on the 
activation construct is an epitope tag. The epitope tag can consist of an amino 
acid sequence that allows affinity purification of the activated protein (e.g., on 
immunoaffinity or chelating matrices). Thus, by including an epitope tag on the 
activation construct, all of the activated proteins from an activation library can be 
purified. By purifying the activated proteins away from other cellular and media 
proteins, screening for novel proteins and enzyme activities can be facilitated. In 
some instances, it may be desirable to remove the epitope tag following 
purification of the activated protein. This can be accomplished by including a 
protease recognition sequence (e.g., Factor Ha or enterokinase cleavage site) 
downstream from the epitope tag on the activation construct. Incubation of the 
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purified, activated protein(s) with the appropriate protease will release the epitope 

tag from the proteins(s). 

In libraries in which an epitope tag sequence is located on the activation 

construct, all of the activated proteins can be purified away from all other cellular 
5 and media proteins using affinity purification. This not only concentrates the 

activated proteins, but also purifies them away from other activities that can 

interfere with the assay used to screen the library. 

Once a pool of clones containing cells over-expressing the gene of interest 

is identified, steps can be taken to isolate the activated cell. Isolation of the 
10 activated cell can be accomplished by a variety of methods known in the art. 

Examples of cell purification methods include limiting dilution, fluorescence 

activated cell sorting, magnetic bead separation, sib selection, and single colony 

purification using cloning rings. 

In preferred embodiments of the invention, the methods include a process 
1 5 wherein the expression product is purified. In highly preferred embodiments, the 

cells expressing the endogenous gene product are cultured so as to produce 

amounts of gene product feasible for commercial application, and especially 

diagnostic and therapeutic and drug discovery uses. 

Any vector used in the methods described herein can include an amplifiable 
20 marker. Thereby, amplification of both the vector and the DNA of interest (i.e., 

containing the over-expressed gene) occurs in the cell, and further enhanced 

expression of the endogenous gene is obtained. Accordingly, methods can include 

a step in which the endogenous gene is amplified. 

Once the activated cell has been isolated, expression can be further 
25 increased by amplifying the locus containing both the gene of interest and the 

activation construct. This can be accomplished by each of the methods described 

below, either separately or in combination. 

Amplifiable markers are genes that can be selected for higher copy number. 

Examples of amplifiable markers include dthydrofolate reductase, adenosine 
30 deaminase, aspartate transcarbamylase, dihydro-orotase, and carbamyl phosphate 
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synthase. For these examples, the elevated copy number of the amplifiable marker 
and flanking sequences (including the gene of interest) can be selected for using 
a drag or toxic metabolite which is acted upon by the amplifiable marker. In 
general, as the drug or toxic metabolite concentration increases, ceils containing 
fewer copies of the amplifiable marker die, whereas cells containing increased 
copies of the marker survive and form colonies. These colonies can be isolated, 
expanded, and analyzed for increased levels of production of the gene of interest. 

Placement of an amplifiable marker on the activation construct results in 
the juxtaposition of the gene of interest and the amplifiable marker in the activated 
cell. Selection for activated cells containing increased copy number of the 
amplifiable marker and gene of interest can be achieved by growing the cells in the 
presence of increasing amounts of selective agent (usually a drug or metabolite). 
For example, amplification of dihydrofolate reductase (DHFR) can be selected 
using methotrexate. 

As drug-resistant colonies arise at each increasing drug concentration, 
individual colonies can be selected and characterized for copy number of the 
amplifiable marker and gene of interest, and analyzed for expression of the gene 
of interest. Individual colonies with the highest levels of activated gene expression 
can be selected for further amplification in higher drug concentrations. At the 
highest drug concentrations, the clones will express greatly increased amounts of 
the protein of interest. 

When amplifying DHFR, it is convenient to plate approximately 1 x 10 7 
cells at several different concentrations of methotrexate. Useful initial 
concentrations of methotrexate range from approximately 5 nM to 100 nM. 
However, the optimal concentration of methotrexate must be determined 
empirically for each cell line and integration site. Following growth in 
methotrexate containing media, colonies from the highest concentration of 
methotrexate are picked and analyzed for increased expression of the gene of 
interest. The clone(s) with the highest concentration of methotrexate are then 
grown in higher concentrations of methotrexate to select for further amplification 
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of DHFR and the gene of interest. Methotrexate concentrations in the micromolar 
and millimolar range can be used for clones containing the highest degree of gene 
amplification. 

Placement of a viral origin of replications) (e.g., ori P or S V40 in human 
ceils, and polyoma ori in mouse cells) on the activation construct will result in the 
juxtaposition of the gene of interest and the viral origin of replication in the 
activated cell The origin and flanking sequences can then be amplified by 
introducing the viral replication protein(s) in trans. For example, when ori P (the 
origin of replication on Epstein-Barr virus) is utilized, EBNA-I can be expressed 
transiently or stably. EBNA-1 will initiate replication from the integrated ori P 
locus. The replication will extend from the origin bi-directionally. As each 
replication product is created, it too can initiate replication. As a result, many 
copies of the viral origin and flanking genomic sequences including the gene of 
interest are created. This higher copy number allows the cells to produce larger 
amounts of the gene of interest. 

At some frequency, the replication product will recombine to form a 
circular molecule containing flanking genomic sequences, including the gene of 
interest. Cells that contain circular molecules with the gene of interest can be 
isolated by single cell cloning and analysis by Hirt extraction and Southern 
blotting. Once purified, the cell containing the episomal genomic locus at elevated 
copy number (typically 10-50 copies) can be propagated in culture. To achieve 
higher amplification, the episome can be further boosted by including a second 
origin adjacent to the first in the original construct. For example, T antigen can 
be used to boost the copy number of ori P/SV40 episomes to a copy number of 
-1000 (Heinzel era/.,./. Virol. 6"2:3738 (1988)). This substantial increase in copy 
number can dramatically increase protein expression. 

The invention encompasses over-expression of endogenous genes both/'« 
vivo and in vitro. Therefore, the cells could be used in vitro to produce desired 
amounts of a gene product or could be used in vivo to provide that gene product 
in the intact animal. 
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The invention also encompasses the proteins produced by the methods 
described herein. The proteins can be produced from either known, or previously 
unknown genes. Examples of known proteins that can be produced by this 
method include, but are not limited to, erythropoietin, insulin, growth hormone, 
5 glucocerebrosidase, tissue plasminogen activator, granulocyte-colony stimulating 

factor, granulocyte/macrophage colony stimulating factor, interferon a, interferon 
p, interferon y, interleukin-2, interleukin-6, interleukin-11, interleukin-12, TGF 
p, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, 
blood clotting factor IX, blood clotting factor X, TSH-p, bone growth factor 2, 

1 0 bone growth factor-7, tumor necrosis factor, alpha- 1 antitrypsin, anti-thrombin III, 

leukemia inhibitory factor, glucagon, Protein C, protein kinase C, macrophage 
colony stimulating factor, stem cell factor, follicle stimulating hormone p, 
urokinase, nerve growth factors, insulin-like growth factors, insulinotropin, 
parathyroid hormone, lactoferrin, complement inhibitors, platelet derived growth 

15 factor, keratinocyte growth factor, neurotropin-3, thrombopoietin, chorionic 

gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, FGF, 
macrophage-colony stimulating factor, and cell surface receptors for each of the 
above-described proteins. 

Where the protein product from the activated cell is purified, any method 

20 of protein purification known in the art may be employed. 

Isolation of Cells Containing Activated Membrane Protein-Encoding Genes 

Genes that encode membrane associated proteins are particularly 
interesting from a drug development standpoint. These genes and the proteins 
they encode can be used, for example, to develop small molecule drugs using 
25 combinatorial chemistry libraries and high through-put screening assays. 

Alternatively, the proteins or soluble forms of the proteins (e.g., truncated proteins 
lacking the transmembrane region) can be used as therapeutically active agents in 
humans or animals. Identification of membrane proteins can also be used to 
identify new ligands (e.g. , cytokines, growth factors, and other effector molecules) 
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using two hybrid approaches or affinity capture techniques. Many other uses of 
membrane proteins are also possible. 

Current approaches to identifying genes that encode integral membrane 
proteins involve isolation and sequencing of genes from cDNA libraries. Integral 

5 membrane proteins are then identified by ORF analysis using hydrophobicity plots 

capable of identifying the transmembrane region of the protein. Unfortunately, 
using this approach a gene encoding an integral membrane protein can not be 
identified unless the gene is expressed in the cells used to produce the cDNA 
library. Furthermore, many genes are only expressed in very rare cells, during 

10 short developmental windows, and/or at very low levels . As a result, these genes 

can not be efficiently identified using the currently available approaches. 

The present invention allows endogenous genes to be activated without 
any knowledge of the sequence, structure, function, or expression profile of the 
genes. Using the disclosed methods, genes may be activated at the transcription 

1 5 level only, or at both the transcription and translation levels. As a result, proteins 

encoded by the activated endogenous gene can be produced in cells containing the 
integrated vector. Furthermore, using specific vectors disclosed herein, the 
protein produced from the activated endogenous gene can be modified, for 
example, to include an epitope tag. Other vectors (e.g., vectors 12-17 described 

20 above) may encode a signal peptide followed by an epitope tag. This vector can 

be used to isolate cells that have activated expression of an integral membrane 
protein (see Example 5 below). This vector can also be used to direct secretion 
of proteins that are not normally secreted. 

Thus, the invention also is directed to methods for identifying an 

25 endogenous gene encoding a cellular integral membrane protein or a 

transmembrane protein. Such methods of the invention may comprise one or more 
steps. For example, one such method of the invention may comprise (a) 
introducing one or more vectors of the invention into a cell; (b) allowing the 
vector to integrate into the genome of the cell by non-homologous recombination; 

30 (c) allowing over-expression of an endogenous gene in the cell by upregulation of 
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the gene by the transcriptional regulatory sequence contained on the integrated 
vector construct; (d) screening the cell for over-expression of the endogenous 
gene; and (e) characterizing the activated gene to determine its identity as a gene 
encoding a cellular integral membrane protein. In related embodiments, the 
invention provides such methods further comprising isolating the activated gene 
from the cell prior to characterizing the activated gene. 

To identify genes that encode integral membrane proteins, vectors 
integrated into the genome of cells will comprise a regulatory sequence linked to 
an exonic sequence containing a start codon, a signal sequence, and an epitope 
tag, followed by an unpaired splice donor site. Upon integration and activation 
of an endogenous gene, a chimeric protein is produced containing the signal 
peptide and epitope tag from the vector fused to the protein encoded by the 
downstream exons of the endogenous gene. This chimeric protein, by virtue of 
the presence of the vector encoded signal peptide, is directed to the secretory 
pathway where translation of the protein is completed and the protein is secreted. 
If, however, the activated endogenous gene encodes an integral membrane 
protein, and the transmembrane region of that gene is encoded by exons located 
3' of the vector integration site, then the chimeric protein will go to the cell 
surface, and the epitope tag will be displayed on the cell surface. Using known 
methods of cell isolation (for example flow cytometric sorting, magnetic bead cell 
sorting, immunoadsorption, or other methods that will be familiar to one of 
ordinary skill in the art), antibodies to the epitope tag can then be used to isolate 
the cells from the population that display the epitope tag and have activated an 
integral membrane encoding gene. These cells can then be used to study the 
function of the membrane protein. Alternatively, the activated gene may then be 
isolated from these cells using any art-known method, e.g. , through hybridization 
with a DNA probe specific to the vector-encoded exon to screen a cDNA library 
produced from these cells, or using the genetic constructs described herein. 

The epitope tag encoded by the vector exon may be a short peptide 
capable of binding to an antibody, a short peptide capable of binding to a 
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substance (e.g., poly histidiae/ divalent metal ion supports, maltose binding 
protein/maltose supports, glutathione S-transferase/glutathione support), or an 
extracellular domain (lacking a transmembrane domain) from an integral 
membrane protein for which an antibody or ligand exists. It will be understood, 
however, that other types of epitope tags that are familiar to one of ordinary skill 
in the art may be used equivalently in accordance with the invention. 

Vectors for Non-targeted Activation of Endogenous Genes 

As noted above, non-targeted gene activation has a number of important 
applications, including activating endogenous genes in host cells which provides 
a powerful approach to discovering and isolating new genes and proteins, and to 
producing large amounts of specific proteins for commercialization. For some 
applications of non-targeted gene activation, it is desirable to create libraries of 
celts in which each member of the library contains an activation vector integrated 
into a unique location in the host cell genome, and in which each member of the 
library has activated a different endogenous gene. Furthermore, it would be 
desirable to remove cells from the library that contain an integrated vector, but fail 
to activate an endogenous gene. Since eukaryotic genomes often contain large 
regions that lack genes, integration of an activation vector into a region devoid of 
genes can occur frequently. These integrated vectors, however, fail to activate an 
endogenous gene, and yet are capable of conferring drug resistance on the host 
cells when a selectable marker (driven by a suitable promoter and followed by a 
polyadenylation signal) is included on the activation vector. Even more 
problematic for gene discovery applications, a transcript containing vector 
sequences is produced in these cells regardless of whether or not a gene has been 
activated. In cases where a gene has not been activated, these vector sequence- 
containing transcripts contain non-genic genomic DNA sequences. As a result, 
when isolating activated genes, one cannot isolate all RNA (or cDNA) molecules 
that are derived from the integrated vector (i.e. transcripts containing vector 
sequences), since many of these transcripts do not encode an endogenous gene. 
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To overcome these difficulties, the present invention provides highly specific 
vectors and methods that facilitate isolation of vector-activated genes. 

These vectors of the invention are useful for activating expression of 
endogenous genes and for isolating the mRNA and cDNA corresponding to the 
5 activated genes. One such vector reduces the number of cells in which the vector 

integrated into the genome but failed to activate expression from (or transcription 
through) an endogenous gene. By removing these cells, fewer library members 
can be created and screened to isolate a given number of activated genes. 
Furthermore, vector-containing cells that fail to activate gene expression produce 

10 an RNA molecule that can interfere with isolation of bona fide activated genes. 

Thus, the vectors disclosed herein are particularly useful for producing cells 
suitable for protein over- expression and/or for isolating cDNA molecules 
corresponding to activated genes. The second type of vector of the invention is 
useful for isolating exon I from activated endogenous genes. As a result, these 

15 vectors can be used to obtain full-length genes from activated RNA transcripts. 

Each of the functional vector components described herein may be used 
separately, or in combination with each other. 



PoIy(A) Trap Activation Vectors 

To facilitate isolation of activated genes, the present invention provides 
20 novel gene activation vectors that are capable of producing a drug resistant 

colony, preferentially upon activation of an endogenous gene. Such vectors are 
referred to herein as "poly(A) trap vectors. " Examples of poly(A) trap vectors are 
shown in Fig. 8A-8F. The nucleotide sequence of one such dual poly(A) trap 
vector, designated pRIG21b, is shown in Fig. 15A-15B (SEQ DD NO: 19). These 
25 vectors contain a transcriptional regulatory sequence (which may be any 

transcriptional regulatory sequence, including but not limited to the promoters, 
enhancers, and repressors described herein, and which preferably is a promoter or 
an enhancer, and most preferably a promoter such as a CM V immediate early gene 
promoter, an SV40 T antigen promoter, a tetracycline-inducible promoter, or a 
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P-actin promoter) operably linked to a selectable marker gene lacking a poly(A) 
signal. Since the selectable marker gene lacks a polyadenylation signal, its 
message will not be stable, and the marker gene product will not be efficiently 
produced. However, if the activation vector integrates upstream of an 
endogenous gene, the selectable marker can utilize the polyadenylation signal of 
the endogenous gene, thereby allowing production of the selectable marker protein 
in sufficient amounts to confer drug resistance. Thus, cells that integrate this 
activation vector generally form a drug resistant colony only if an endogenous 
gene has been activated. 

The poly(A) trap activation vectors can include any selectable or 
screenable marker. Furthermore, the selectable marker can be expressed from any 
promoter that is functional in the cells used to create the integration library. Thus, 
the selectable marker can be expressed by viral or non- viral promoters. 
Optionally, an unpaired splice donor site may be included in the construct, 
preferably 3 ' of the selectable marker to allow the exon encoding the selectable 
marker to be spliced directly to the exons of the endogenous gene. When a 
downstream transcriptional regulatory sequence and a splice donor site is included 
on the vector, the inclusion of a splice donor site adjacent to the selectable marker 
results in the removal of these downstream elements from the messenger RNA. 

In a related embodiment, a second transcriptional regulatory sequence 
(which may be any transcriptional regulatory sequence, including but not limited 
to the promoters, enhancers, and repressors described herein, and which preferably 
is a promoter or an enhancer, and most preferably a promoter) may be located 
downstream of, and in the same orientation as, the selectable marker. Optionally, 
an unpaired splice donor site may be linked to the downstream transcriptional 
regulatory sequence. In this configuration, the poly(A) trap vector is capable of 
producing a message containing the downstream vector-encoded exon spliced to 
endogenous exons. As described below, these chimeric transcripts can be 
translated into native or modified protein, depending on the nature of the vector- 
encoded exon. 
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As used herein, a "vector-encoded exon" means a region of a vector 
downstream of the transcriptional regulatory sequence and between the 
transcription start site and the unpaired splice donor site found on the vector. The 
vector-encoded exon is present at the 5' end of the transcript containing the 

5 endogenous gene in the fully processed message. Analogously, as used herein, a 

"vector-encoded intron" is the region of the vector located downstream of the 
unpaired splice donor site. When a linearization site is present on the vector, the 
vector-encoded intron is the region of the vector that is downstream of the vector- 
encoded exon between the unpaired splice donor site and the linearization site. 

10 The vector-encoded intron is removed from the activated gene transcript during 

RNA processing. 

Splice Acceptor Trap (SAT) Vectors 

As an alternative approach for removing cells that fail to activate an 
endogenous gene, the invention provides additional vectors designated herein as 

1 5 "Splice Acceptor Trap" (SAT) vectors. These vectors are designed to splice from 

a vector encoded splice donor site to an endogenous splice acceptor. 
Furthermore, the vectors are designed to produce a product that is toxic to the 
host cells (or a product that can be selected against) if splicing does not occur. 
Thus, these vectors facilitate elimination of cells in which the vector-encoded exon 

20 failed to splice to an endogenous exon. 

The splice acceptor trap vectors can contain both a positive selectable 
marker and a negative selectable marker gene oriented in the same direction on the 
vector. As used herein, a positive selectable marker is a gene that, upon 
expression, produces a protein capable of facilitating the isolation of cells 

25 expressing the marker. Analogously, as used herein, a negative selectable marker 

is a gene that, upon expression, produces a protein capable of facilitating removal 
of cells expressing the marker. 

The positive selectable marker and the negative selectable marker are 
preferably separated in the vector construct by an unpaired splice donor site. In 
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other embodiments, however, the positive selectable marker may be fused to the 
negative selectable marker gene. In this configuration, an unpaired splice donor 
site is located between the positive and negative selectable marker, such that the 
reading frame of the negative selectable marker is preserved. The unpaired splice 
5 donor site is preferably located at the junction of the positive and negative 

selectable markers. However, the unpaired splice donor site may be located 
anywhere in the fusion gene such that upon splicing to an endogenous splice 
acceptor site, the positive selectable marker will be expressed in an active form 
and the negative selectable marker will be expressed in an inactive form, or not at 
1 0 all. In this configuration, the positive selectable marker is located upstream of the 

negative selectable marker. 

It will also be apparent to one of ordinary skill in view of the description 
contained herein that the positive and negative selectable markers on the SAT 
• vector need not be expressed as a fusion protein. In one embodiment, an internal 
1 5 ribosomal entry site (ires) is inserted between the positive selectable marker and 

the negative selectable marker. In this configuration, the unpaired splice donor 
site can be positioned between the two markers, or in the open reading frame of 
either marker gene such that, upon splicing, the positive selectable marker will be 
expressed in an active form and the negative selectable marker will be expressed 
20 in an inactive form, or not at all. In another embodiment, the positive selectable 

marker may be driven from a different transcriptional regulatory sequence than the 
negative selectable marker. In this configuration, the unpaired splice donor site 
is located in the 5' untranslated region of the negative selectable marker or 
anywhere in the open reading frame of the negative selectable marker such that, 
25 upon splicing, the negative selectable marker will be produced in an inactive form 

or not at all. Furthermore, when the positive and negative markers are driven 
from different transcriptional regulatory sequences, the positive selectable marker 
may be located upstream or downstream of the negative selectable marker, and the 
positive selectable marker may contain or lack a splice donor site at its 3' end. 
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The vectors described herein may contain any positive selectable marker. 
Examples of positive selectable markers useful in this invention include genes 
encoding neomycin (neo), hypoxanthine phosphoriosyl transferase (HPRT), 
puromycin (pac), dihydro-oratase, glutamine synthetase (GS), histidine D (his D), 
5 carbamyl phosphate synthase (CAD), dihydrofolate reductase (DHFR), multidrug 

resistance 1 (mdrl), aspartate transcarbamylase, xanthine-guanine phosphoribosyl 
transferase (gpt), and adenosine deaminase (ada). Alternatively, the vectors may 
contain a screenable marker in place of the positive selectable marker. Screenable 
markers include any protein capable of producing a recognizable phenotype in the 

1 0 host cell. Examples of screenable markers included cell surface epitopes (such as 

CD2) and enzymes (such as P-galactosidase). 

The vectors described herein may also, or alternatively, contain any 
negative selectable marker that can be selected against. Examples of negative 
selectable markers include hypoxanthine phosphoribosyl transferase (HPRT), 

1 5 thymidine kinase (TK), and diptheria toxin. The negative selectable marker can 

also be a screenable marker, such as a cell surface protein or an enzyme. Cells 
expressing the negative screenable marker may be removed by, for example, 
Fluorescence Activated Cell Sorting (FACS) or magnetic bead cell sorting. 

To isolate cells that have activated expression of an endogenous gene, the 

20 cells containing the integrated vector can be placed under the appropriate drug 

selection. Selection for the positive selectable marker and against the negative 
selectable marker can occur simultaneously. In another embodiment, selection can 
occur sequentially. When selection occurs sequentially, selection for the positive 
selectable marker can occur first, followed by selection against the negative 

25 selectable marker. Alternatively, selection against the negative selectable marker 

can occur first, followed by selection for the positive selectable marker. 

The positive and negative markers are expressed by a transcriptional 
regulatory element located upstream of the translation start site of each gene. 
When a positive/negative marker fusion gene or an ires sequence is used, a single 

30 transcriptional regulatory element drives expression of both markers. A poly(A) 
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signal may be placed 3' of each selectable marker. If a positive/negative fusion 
gene is used a single poly(A) signal is positioned 3' of the markers. Alternatively, 
a poly(A) signal may be excluded from the vector to provide additional specificity 
for a gene activation event (see dual poly(A)/spIice acceptor trap below). 

5 Dual Poly(A)/SpHce Acceptor Trap Vectors 

To further reduce the number of cells that lack a gene activation event, the 
invention also provides vectors that confers host cell survival only if the vector- 
encoded exon has spliced to an exon from an endogenous gene and has acquired 
a poly(A) signal. These vectors are designated herein as "dual poly(A)/splice 

10 acceptor trap vectors" or as "dual poly(A)/SAT vectors." By requiring both 

splicing and polyadenylation to occur for cell survival, cells that fail to activate an 
endogenous gene are more efficiently eliminated from the activation library. 

The dual poly(A)/splice acceptor trap vectors contain a positive selectable 
marker and a negative selectable marker configured as described for the SAT 

15 vectors; however, neither gene contains a functional poly(A) signal. Thus, the 

positive selectable marker will not be expressed at high levels unless splicing 
occurs to capture an endogenous poly(A) signal. Aside from the lack of a poly(A) 
signal, all other features and embodiments of this type of vector are the same as 
those of the SAT vectors as described herein. Examples of dual poly(A)/SAT 

20 vectors are shown in Figs. 9 A-9F and 1 OA- 1 OF. The nucleotide sequence of one 

such dual poly(A)/SAT vector, designated pRIG22b, is shown in Fig. 16A-16B 
(SEQ ID NO:20). 

Vectors for Activating Protein Expression from Endogenous Genes 

In many applications of non-targeted gene activation, it is desirable to 
25 produce protein from the activated endogenous gene. To accomplish this, a 

second transcriptional regulatory sequence (which may be any transcriptional 
regulatory sequence, including but not limited to the promoters, enhancers, and 
repressors described herein, and which is preferably a promoter or an enhancer, 
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and most preferably a promoter) can be placed downstream of the selectable 
marker(s) on any of the vectors described herein. When poly(A) trap vectors, 
SAT vectors, or dual poly(A) trap/SAT vectors are used, the downstream 
transcriptional regulatory sequence is positioned to drive expression in the same 
direction as the upstream selectable marker(s). To activate expression of full- 
length protein with this type of vector, however, the vector must integrate into the 
5' UTR of the endogenous gene to avoid cryptic start ATG codons upstream of 
exon I. 

Alternatively, to increase the frequency of protein expression using non- 
targeted gene activation, the downstream transcriptional regulatory sequence on 
the vector may be operably linked to an exonic sequence followed by a splice 
donor site. In a preferred embodiment, the vector exon lacks a start codon. This 
vector is particularly useful for activating protein expression from genes that do 
not encode the translation start codon in exon L In an alternative preferred 
embodiment, the vector exon contains a start codon. Additional codons can be 
located between the translational start codon and the splice donor site. For 
example, a partial signal secretion sequence can be encoded on the vector exon. 
The partial signal sequence can be any amino acid sequence capable of 
complementing a partial signal sequence from an endogenous gene to produce a 
functional signal sequence. The partial sequence may encode between one and 
one hundred amino acids, and may be derived from existing genes, or may consist 
of novel sequences. Thus, this vector is useful for producing and secreting protein 
from genes that encode part of the endogenous signal sequence in exon I, and the 
remainder in subsequent exons. In another example of a vector useful for 
activating a particular type of endogenous gene, a functional signal sequence can 
be encoded on the vector exon. This vector allows protein to be produced and 
secreted from genes that encode a signal sequence in exon I. It can also be used 
to produce secreted forms of proteins that are not normally secreted. 

In cases where a start codon is included on the vector exon, it can be 
advantageous to produce a vector in each reading frame. This is achieved by 
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varying the number of nucleotides between the start codon and the splice donor 
junction site. Together, the preferred vector configurations are capable of 
producing protein from endogenous genes, regardless of the exon/intron structure, 
location of the translation start codon, or reading frame. 

Vectors for Isolating Exon I from Activated Endogenous Genes 

The non-targeted gene activation vectors described above are useful for 
activating and isolating endogenous genes and for producing protein from 
endogenous genes. Upon integration upstream of an endogenous gene, however 
each of these vectors produces a transcript that lacks exon I from the endogenous 
gene. Since the vectors are designed to produce a transcript containing the vector 
encoded exon spliced to the first splice acceptor site downstream of the vector 
integration site, and since the first exon of eukaryotic genes does not contain a 
splice acceptor site, normally, the first exon of endogenous genes will not be 
recovered on mRNA molecules derived from non-targeted gene activation. For 
some genes, such as genes that contain coding information in the first exon, there 
is a need to efficiently recover the first exon of the activated endogenous gene. 

To recover the first exon of activated endogenous genes, a transcriptional 
regulatory sequence (which may be any transcriptional regulatory sequence, 
including but not limited to the promoters, enhancers, and repressors described 
herein, and which is preferably a promoter or an enhancer, and most preferably a 
promoter) is included on the activation vector downstream of a second 
transcriptional regulatory sequence (which may also be any transcriptional 
regulatory sequence, including but not limited to the promoters, enhancers, and 
repressors described herein, and which is preferably a promoter or an enhancer, 
and most preferably a promoter) which drives expression of a vector encoded 
exon. Thus, the upstream transcriptional regulatory sequence is linked to an 
unpaired splice donor site and the downstream transcriptional regulatory sequence 
is not linked to a splice donor site. Both transcriptional regulatory sequences are 
oriented to drive expression in the same direction. Examples of such exon I 
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recovery vectors are shown in Fig. 12A-12G. The integration of this type of 
vector will create at least two different types of RNA transcripts (Figure 13). The 
first transcript is derived from the upstream transcriptional regulatory sequence 
and contains the vector exon spliced to exon II of an endogenous gene. The 
5 second transcript is derived from the downstream transcriptional regulatory 

sequence and contains, from 5' to 3', the region between the vector and the 
transcription start site of the gene, exon I, exon II, and all downstream exons. 
Using methods described herein, both transcripts can be recovered and analyzed, 
allowing the characterization of exon I from genes isolated by non-targeted gene 

10 activation. 

The exon located on the activation vector can encode a selectable marker, 
a protein, a portion of a protein, secretion signal sequences, a portion of a signal 
sequence, an epitope, or nothing. When a protein is encoded by the exon, a 
poly(A) signal may be included downstream of the vector encoded gene. 

1 5 Alternatively, a poly(A) signal may be omitted. In another embodiment, a positive 

and negative selectable marker may be operably linked to the upstream 
transcriptional regulatory sequence(s). In this embodiment, the position of the 
unpaired splice donor site relative to the selectable markers is described above for 
the SAT vectors and the dual poly(A)/SAT vectors. 



20 Gene Activation Vectors for Single-Exon and Multi-Exon Gene Trapping 

As noted above, in one embodiment the poly(A) trap vectors of the 
invention may contain a promoter operably linked to a selectable marker followed 
by an unpaired splice donor site. Such vectors, when integrated into or near a 
gene, produce transcripts containing the selectable marker spliced onto an 
25 endogenous gene. Since the endogenous gene encodes a poly(A) signal, the 

resulting mRNA is polyadenylated, thereby allowing the transcript to be translated 
at levels sufficient to confer drug resistance on the cell containing the integrated 
vector. 
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While the vectors described above are capable of "trapping" endogenous 
genes, the splice donor site downstream of a selectable marker cannot be used in, 
and in some cases can interfere with, several potential applications for such 
vectors. First, these vectors cannot be used to selectively trap single exon genes, 
5 since these genes do not contain a splice acceptor site. Second, these vectors 

often "trap" cryptic genes, since drug resistance relies solely on vector integration 
upstream of a poly (A) signal. Unfortunately, cryptic poly (A) signals exist in the 
genome, leading to formation of drug resistant cells and creation of non-genic 
transcripts containing the selectable marker. These cells and transcripts can 

10 interfere with gene discovery applications using these vectors. Third, without 

novel modifications such as those described herein (see above), these vectors are 
not capable of efficiently producing protein from the activated endogenous gene. 
Furthermore, protein expression from an endogenous gene can be poor even when 
an internal ribosome entry site (ires) is included between the selectable marker and 

1 5 the splice donor site, since translation from an ires is generally less efficient than 

translation from the first start codon at the 5' end of a transcript. Thus, there is 
a need for vectors that are capable of more specifically trapping endogenous 
genes, including single exon genes, and that are capable of efficiently expressing 
protein from the activated endogenous genes. 

20 Thus, in additional embodiments, the present invention provides such 

vectors. In one such embodiment, the vector may contain a promoter operably 
linked to one or more (i.e., one, two, three, four, five, or more) selectable 
markers, wherein the selectable marker is not followed by a splice donor site or 
a poly(A) signal (see Figures 17A-17G). In general, upon integration into a host 

25 cell genome, this vector will fail to produce sufficient quantities of selectable 

marker since the marker transcript will not be polyadenylated. However, if the 
vector integrates in close proximity to, or into, a gene, including a single exon 
gene, the selectable marker will acquire a poly(A) signal from the endogenous 
gene, thereby stabilizing the marker transcript and conferring a drug resistant 

30 phenotype on the ceil. In addition to selecting for vector integration into or near 
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genes, vectors according to this aspect of the invention can also be used to recover 
exon I from the activated gene, as described in the section of this application 
entitled "Vectors for Isolating Exon I from Activated Endogenous Genes." 

In a preferred embodiment, the vector can contain a second selectable 
marker upstream of the first selectable marker (see Figure 18). The upstream 
selectable marker is preferably operably linked to a transcriptional regulatory 
sequence, most preferably a promoter. Optionally, an unpaired splice donor site 
can be positioned between the transcription start site and the translation start site 
of the upstream selectable marker. Alternatively, the splice donor site may be 
located anywhere in the open reading frame of the upstream selectable marker, 
such that, following vector integration into a host cell genome, and upon splicing 
from the vector encoded splice donor site to an endogenous exon, the upstream 
selectable marker will be produced in an inactive form, or not at all. By selecting 
for cells that produce the downstream positive selectable marker in an active form, 
cells containing the vector integrated into or near a gene can be isolated. 
Furthermore, by selecting against cells producing the upstream selectable marker 
in the active form, cells in which the vector transcript has spliced to an exon from 
a multi-exon endogenous gene can be removed. In other words, these vectors can 
be used to isolate cells that contain a vector integrated into a single exon gene or 
into the 3 ' most exon of a multi-exon gene since, in these instances, a splice 
acceptor site is absent between the vector encoded splice donor site and the 
endogenous poly (A) signal. Thus, the majority of cells containing activated 
multi-exon genes will not survive selection, and as a result, cells containing 
activated single exon genes will be greatly enriched in the library. 

In another preferred embodiment, vectors according to this aspect of the 
invention may contain one or more (i.e., one, two, three, four, five, or more, and 
preferably one) negative selectable marker(s) upstream of the first selectable 
marker (see Figures 19A and 19B). The negative selectable marker preferably is 
operably linked to a promoter. Optionally, an unpaired splice donor site may be 
positioned between the transcription start site and the translation start site of the 
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negative selectable marker. Alternatively, the splice donor site may be located 
anywhere in the open reading frame of the negative selectable marker, such that, 
following vector integration into a host cell genome, and upon splicing from the 
vector encoded splice donor site to an endogenous exon, the negative selectable 
marker will be produced in an inactive form, or not at all. By selecting for cells 
that produce the positive selectable marker in an active form and selecting against 
cells producing the negative selectable marker in the active form, these vectors can 
be used to identify cells containing the vector integrated into or upstream of an 
endogenous gene. Since (1) splicing to an endogenous exon and (2) acquisition 
of a poly (A) signal are both required for cell survival, cells containing cryptic 
gene trap events are reduced within the library. The reason for this is that the 
probability of a vector integrating next to both a cryptic splice acceptor site and 
a cryptic poly (A) signal is substantially less than the probability of the vector 
integrating next to a single cryptic site. Thus, these vectors provide a higher 
degree of specificity for trapping genes than previous vectors. 

It will also be recognized by one of ordinary skill in view of the teachings 
contained herein that vectors containing positive and negative selectable markers 
can be used to produce protein from the activated endogenous gene. One vector 
configuration capable of directing protein production consists of the splice donor 
site positioned in the 5' UTR of the negative selectable marker. Upon splicing, 
a chimeric transcript containing the 5' UTR from the negative selectable marker 
linked to the second exon of an endogenous gene is produced. This vector is 
capable of activating protein production from genes that encode a translation start 
codon in the second or subsequent exon. Likewise, the splice donor site can be 
placed in the open reading frame of the negative selectable marker, in a position 
that does not interfere with the function of the marker unless splicing has 
occurred. Similar vectors containing the splice donor site positioned in different 
reading frames relative to the translation start codon can also be used. Upon 
splicing to an endogenous gene, these vectors will produce a chimeric transcript 
containing a start codon from the negative selectable marker fused to exon II of 
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the activated endogenous gene. Thus, these vectors will be capable of activating 
protein expression from genes that encode a translation start codon in exon I. 
Additional positive/negative selection vector designs capable of efficiently 
producing protein from activated endogenous genes are described below. 

5 Any of the vectors of the invention can contain an internal ribosome entry 

site (ires) 3' of the downstream selectable marker. The ires allows translation of 
the endogenous gene upon vector integration into an endogenous gene. 
Optionally, a translation start codon may be included between the selectable 
marker and the ires sequence. When a start codon is present, additional codons 

10 may be present on the exon. The start codon, and if present additional codons, 

may be present in any, and collectively all, reading frames relative to the splice 
donor site. Furthermore, the codons downstream of the translation start codon, 
if present, may encode, for example, a signal secretion signal, a partial signal 
sequence, a protein (including a full-length protein, a portion of a protein, a 

1 5 protein motif, an epitope tag, etc.), or a spacer region. 

In additional preferred embodiments, any of the vectors described herein 
may contain, upstream of the selectable marker(s), a second transcriptional 
regulatory sequence (most preferably a promoter) operably linked to a exonic 
region, followed by an unpaired splice donor site. This upstream exon is 

20 particularly useful for expressing protein from activated endogenous genes. The 

exon may lack a translation start codon. Alternatively, the exon may contain a 
translation start codon. When a start codon is present, additional codons may be 
present on the exon. The start codon, and if present additional codons, may be 
present in any, and collectively all, reading frames relative to the splice donor site. 

25 Furthermore, the codons downstream of the translation start codon, if present, 

may encode, for example, a signal secretion sequence, a partial signal sequence, 
a protein (including a full-length protein, a portion of a protein, a protein motif, 
an epitope tag, etc.), or a spacer region. 
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Activation Vectors Useful for Detecting Protein-protein Interactions 

Genetic approaches for detecting protein-protein interactions have previously been 
described (see, e.g., U.S. Patent Nos. 5,283,173; 5,468,614; and 5,667,973, the 
disclosures of which are folly incorporated herein by reference). This approach 
relies on cloning a first cDNA molecule next to, and in frame with, a gene 
fragment encoding a DNA binding domain; and cloning a second cDNA molecule 
next to, and in frame with, a gene fragment encoding a transcription 
transactivation domain. Each chimeric gene is expressed from a promoter region 
located upstream of the chimeric gene. To detect expression, both chimeric genes 
are transfected into a reporter cell. If the first chimeric protein interacts with the 
second chimeric protein (via the proteins encoded by the cloned cDNA's fused to 
the DNA binding and transcription activation domains), then the DNA binding 
domain and the transcription activation domain will be joined within a single 
protein complex. As a result, the protein-protein interaction complex can bind to 
the regulatory region of the reporter gene and activate its expression. 

A limitation of this previous approach is that it is only capable of detecting 
protein-protein interactions between genes that have been cloned as cDNA. As 
described herein, many genes are expressed at very low levels, in rare cell types, 
or during short developmental windows; and therefore, these genes are typically 
absent from cDN A libraries. Furthermore, many genes are too large to be isolated 
efficiently as full-length clones, thereby making it difficult to use these previous 
approaches. 

The present invention is capable of activating protein expression from 
endogenous genes or from transfected genomic DNA. Unlike previous 
approaches, virtually any gene can be efficiently expressed, regardless of its 
normal expression pattern. Furthermore, since the present invention is also 
capable of modifying the protein expressed from the endogenous gene (or from 
the transfected genomic DNA), it is also possible to produce chimeric proteins for 
use in protein-protein interaction assays. 
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To detect protein-protein interactions by the present invention, two 
vectors are used. The first vector, generally referred to as BD/SD (binding 
domain/splice donor), contains a promoter operably linked to a polynucleotide 
encoding a DNA binding domain and an unpaired splice donor site. The second 
vector, generally referred to as AD/SD (activation domain/splice donor), contains 
a promoter operably linked to a polynucleotide encoding a transcription activation 
domain and an unpaired splice donor site. To accommodate genes that have 
different reading frames, the binding domain and activation domain can be 
encoded in each of the three possible reading frames relative to the unpaired splice 
donor site. In addition, BD/SD and AD/SD vectors can have other functional 
elements, as described herein for other vectors, including selectable markers and 
amplifiable markers. The vectors may also contain selectable markers oriented in 
a configuration that permits selection for cells in which the vector has activated 
a gene. Multi-promoter/activation exon vectors are also useful. Several examples 
of BD/SD and AD/SD vectors are illustrated in Figure 25. An example illustrating 
detection of a protein-protein interaction using these vectors is depicted in 
Figure 26. 

The DNA binding domain of the BD/SD vector may encode any protein 
domain capable of binding to a specific nucleotide sequence. When a transcription 
activation protein is used to supply the DNA binding domain, the transcription 
activation domain is omitted from the BD/SD vector. Examples of genes 
encoding proteins with DNA binding domains include, but are not limited to, the 
yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADR1 gene. Other genes 
from prokaryotic and eukaryotic sources may also be used to supply DNA binding 
domains. 

The transcription activation domain of the AD/SD vector encodes a 
protein domain capable of enhancing transcription of a reporter gene when 
positioned near the promoter region of the reporter gene. When a transcription 
activation protein is used to supply the transcription activation domain, the DNA 
binding domain is omitted from the AD/SD vector. Examples of genes encoding 
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proteins with transcription activation domains include, but are not limited to, the 
yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADR1 gene. Other genes 
from prokaryotic and eukaryotic sources may also be used to supply transcription 
activation domains. 

In the present invention, protein-protein interactions are detected using the 
BD/SD and AD/SD vectors, described above, to activate expression of genes 
located in stretches of genomic DNA. 

In one embodiment, the BD/SD vector is integrated randomly into the 
genome of a reporter cell line. As with other vectors described herein, the BD/SD 
vectors are capable of activating protein expression from genes located 
downstream of the vector integration site. Since the activation exon on the 
BD/SD vector encodes a DNA binding domain, the activated endogenous protein 
will be produced as a fusion protein containing the DNA binding domain at its N- 
terminus. Thus, by integrating the BD/SD vector into the genome of a host cell, 
a library of fusion proteins can be created, wherein each protein will contain a 
DNA binding domain at its N-terminus. 

It is also recognized that the AD/SD vector can be integrated into the 
genome of a reporter cell line to produce a library of cells, wherein each member 
of the library is expressed as a different endogenous gene fused to a transcription 
activation domain. 

Once created, the BD/SD library may be transfected with a vector 
expressing a specific gene (referred to below as gene X) fused to a transcription 
activation domain. This allows virtually any gene encoded in the genome to be 
tested for an interaction to gene X. Likewise, the AD/SD library may be 
transfected with a vector expressing a specific gene (e.g. gene X) fused to a DNA 
binding domain. This allows virtually any gene encoded in the genome to be 
tested for an interaction to gene X. It is also recognized that the specific gene 
may be stably expressed in the host cell prior to construction of the BD/SD or 
AD/SD libraries. 
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In an alternative embodiment, genomic DNA is cloned into the BD/SD 
and/or AD/SD vector(s) downstream of the DNA binding domain and activation 
domain, respectively. If a gene is present and correctly oriented in the genomic 
DNA, then the BD/SD vector (or the AD/SD vector) will be capable of expressing 
the gene as a fusion protein useful for detecting protein-protein interactions. Like 
integration of BD/SD (or AD/SD) vectors in situ, any gene can be tested 
regardless of whether it has been previously isolated as a cDNA molecule. 

In another embodiment, a second library is created in the cells of the first 
library. For example, the AD/SD vector can be integrated into cells comprising 
the BD/SD library. Conversely, the BD/SD vector can be integrated into cells 
comprising the AD/SD library. This allows all proteins expressed as binding 
domain fusion proteins to be tested against all activation domain fusion protein. 
Since the present invention is capable of expressing substantially all of the proteins 
(as fusions with the binding and activation domains) in a eukaryotic organism, this 
approach, for the first time, allows all combinations of protein-protein interactions 
to be tested in a single library. To survey all protein-protein interactions in an 
organism, the library within a library must be substantially comprehensive. For 
example, to detect -50% of protein-protein interactions in an organism containing 
1 00,000 genes, the first library must contain at least 1 00,000 cells, each expressing 
an activated gene. Within each clone of the first library, the second vector would 
then be used to create a library of at least 100,000 clones, each containing an 
activated gene. Thus, the total library would contain 100,000 clones x 100,000 
clones, or 10 10 total clones. This assumes all genes are activated at equal 
frequencies, and that each gene activation event results in production of a fusion 
protein in frame with the activated endogenous gene. To produce libraries with 
greater than 50% coverage of protein-protein interactions, and/or to ensure that 
proteins that are activated at lower frequencies are represented, larger libraries can 
be created. 

It is also recognized that library vs. library screens can be created in several 
ways. First, both libraries are produced, simultaneously or sequentially, by 
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integrating BD/SD and AD/SD vectors into the genome of the same reporter cells. 
Second, a first library is created by integrating a BD/SD vector into the genome 
of a reporter cell, and a second library is produced by transfecting the AD/SD 
vector containing cloned genomic DNA. It is recognized that in this approach, the 
AD/SD library may be created first, followed by introduction of a BD/SD vector 
containing cloned genomic DNA. It is also recognized that the first library can be 
created by transfecting the BD/SD vector (or AD/SD vector) containing cloned 
genomic DNA, followed by integrating the second vector into the reporter cell 
genome. Third, both libraries are created, simultaneously or sequentially, by 
transfecting cells with a BD/SD and AD/SD vectors, wherein each vector contains 
a cloned fragment of genomic DNA. Fourth, it is recognized that when cloned 
genomic fragments are used in either the BD/SD vector or the AD/SD vector, a 
cDNA library may be created in the other vector and introduced into cells. This 
allows all of the genes present in the cDNA library to be tested for interaction with 
all other genes in the genome. 

Since library/library screens involve the creation of large libraries of cells, 
it is important to maximize the frequency of gene activation and in frame fusion 
protein production among the members of the library. This can be accomplished 
in at least two ways. First, the BD/SD and AD/SD vectors can contain selectable 
markers in a configuration that "traps" genes. Examples of selection trap vectors 
are shown in Figures 8, 9, 10, 17, 19, 21, and 25. These vectors select for cells 
in which the activation vector has transcriptionally activated a gene. Second, 
multiple promoter/activation exon units can be included on the BD/SD and 
AD/SD vectors. Each promoter/activation exon unit encodes the binding domain 
(or activation domain) in a different reading frame relative to the unpaired splice 
donor site. An example of a multi-promoter/exon vector is illustrated in figure 23 . 
This type of vector ensures that any gene activated at the transcription level will 
be produced as an in frame fusion protein from on of the promoter/activation exon 
units on the vector. Third, the vectors can be introduced into the reporter cells 
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using efficient transfection procedures. In this respect, insertion of BD/SD and 
AD/SD vectors by retroviral integration is advantageous. 

Reporter cells useful in the present invention include any cell that is 
capable of properly splicing the transcripts produced by the BD/SD and AD/SD 
vectors. The reporter cells contain a reporter gene that is expressed at higher 
levels in the presence of a protein-protein interaction between proteins expressed 
from BD/SD and AD/SD vectors. The reporter gene may be a selectable marker, 
such as any of the markers described herein. Alternatively, the reporter gene may 
be a screenable marker. Examples of useful selectable markers and screenable 
markers are described herein. 

In the reporter cells, a minimal promoter is operably linked to the reporter 
gene. To allow increased expression of the reporter gene in the presence of a 
protein-protein interaction, a DNA binding site is positioned in or near the minimal 
promoter, such that the DNA binding site is recognized by the protein encoded by 
the DNA binding domain region of the BD/SD vector. In the absence of a 
protein-protein interaction, the DNA binding domain fusion protein produced 
from BD/SD lacks a transcription activation domain, and therefore, can not 
activate transcription from the minimal promoter of the reporter gene. If, 
however, the DNA binding domain fusion protein produced from BD/SD interacts 
with the activation domain fusion protein produced from the AD/SD vector, then 
the protein complex can activate expression of the reporter gene. Increased 
reporter gene expression can be detected using an assay for the screenable marker, 
or using drug selection for a selectable marker. 

It is also recognized that other reporter systems can be used in conjunction 
with the present invention to detect protein-protein interactions. Specifically, any 
protein that contains two separable domains, each required to be in close 
proximity with the other to produce a biochemical or structural activity, can be 
used in conjunction with the present invention. 
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Multi-Promoter/Activation Exon Vectors 

In applications of nontargeted gene activation in which the goal is to 
activate protein expression from an unknown gene, a collection of vectors 
typically must be used. Thus, in an additional embodiment, the invention provides 
5 vectors containing one or more promoter/activation exon units (see Figures 20 A- 

20E). 

To accommodate the variety of gene structures that exist in the genomes 
of eukaryotic cells, vectors according to this aspect of the invention preferably 
i contain a transcriptional regulatory sequence (e.g. , a promoter) operably linked 

"10 to an activation exon with a different structure. Collectively, these activation 

exons are capable of activating protein expression from substantially all 
endogenous genes. For example, to activate protein expression from genes that 
encode a translation start codon in exon II (or exons downstream of exon II), one 
vector can contain a transcriptional regulatory sequence (e.g., a promoter) 
15 operably linked to an activation exon lacking a translation start codon. To 

activate protein expression from all types of genes that encode a translation start 
codon in exon I, three separate vectors must be used, each containing a 
transcriptional regulatory sequence (e.g., a promoter) operably linked to a 
different activation exon. Each activation exon encodes a start codon in a 
20 different reading frame. Additional activation exon configurations are also useful. 

For example, to activate protein expression and secretion from genes that encode 
a portion of their signal secretion sequence in exon I, three separate vectors must 
be used, each containing a transcriptional regulatory sequence (e.g., a promoter) 
operably linked to a different activation exon. Each activation exon encodes a 
25 partial signal sequence in a different reading frame. To activate protein expression 

and secretion from genes that encode their entire signal sequence in exon I, three 
vectors must be used, each containing a transcriptional regulatory sequence (e.g. , 
a promoter) operably linked to a different activation exon. Each activation exon 
contains an entire signal secretion sequence in a different reading frame. In 
30 addition to activating expression of genes that encode secreted proteins, 



-96- 



promoter/activatiori exons encoding entire signal sequences will also activate 
expression and secretion of proteins that are not normally secreted. This, for 
example, can facilitate protein purification of proteins that are normally 
intracellularly localized. 

Other useful coding sequences can be included on the activation exon of 
vectors according to this aspect of the invention, including but not limited to 
sequences encoding proteins (including full length proteins, portions of proteins, 
protein motifs, and/or epitope tags). As described herein, vectors according to 
this aspect of the invention can be integrated, individually or collectively, into the 
genome of a host cell to produce a library of cells. Each member of the library 
will potentially overexpress a different endogenous protein. Thus, these 
collections of vectors make it possible to activate all or substantially all of the 
endogenous genes in a eukaryotic host cell. 

When integrating a collection of vectors into host cells, as described 
above, activation of protein expression can be achieved from substantially any 
gene. Unfortunately, to produce protein from all endogenous genes, a large 
number of library members must be generated. In part, this is due to the large 
number of genes encoded by the host cell. In addition, using this approach, many 
cells will contain a vector integrated into or near an endogenous gene; however, 
the integrated vector will contain an activation exon with a structure that is 
incompatible with activating protein expression from the endogenous gene. For 
example, the vector exon may encode a start codon in reading frame 1 (relative 
to the splice junction), whereas the protein encoded by the first exon downstream 
of the integrated vector may be in reading frame 2 (relative to the splice junction). 
Thus, many library members will contain an integrated vector that has activated 
transcription of an endogenous gene, but that failed to produce the protein 
encoded by the endogenous gene. 

To decrease the number of cells that fail to activate protein expression - 
following vector integration into or near an endogenous gene, a vector containing 
multiple promoter/activation exons can be used. On this vector, each 



-97- 



promoter/activation exon unit can be capable of activating protein expression from 
an endogenous gene with a different structure. Since a single vector comprising 
multiple activation exons is capable of producing multiple transcripts, each 
containing a different activation exon, a single vector integrated into or near a 
gene can be capable of activating protein expression, regardless of the structure 
of the endogenous gene (see Figure 21). 

Multi-promoter/activation exon vectors can contain two or more 
promoter/activation exons. Each promoter/activation exon unit may be followed 
by an unpaired splice donor site. In one such embodiment, two 
promoter/activation exons are included on the vector, wherein each 
promoter/activation exon is capable of activating protein expression from a 
different type of endogenous gene. In a preferred embodiment, the vector may 
contain three promoter/activation exons, wherein each exon encodes a translation 
start codon in a different reading frame. In another preferred embodiment, the 
vector may contain three promoter/activation exons, wherein each exon encodes 
a partial signal secretion sequence in a different reading frame. In yet another 
preferred embodiment, the vector may contain three promoter/activation exons, 
wherein each exon encodes an entire signal secretion sequence in a different 
reading frame. Additional embodiments include each of the vectors above 
containing a fourth promoter/activation exon, wherein the fourth activation exon 
does not encode a translation start codon. 

Any number (e.g. , one or more, two or more, three or more, four or more, 
five or more, etc.) of promoter/activation exon units may be included on the 
vector. When multiple promoter/activation exons are present on a single vector, 
they are preferably oriented in the same direction relative to one another (i.e. , the 
promoters drive expression in the same direction). 

The promoters that drive transcription of different activation exons may 
be the same as one another or one or more promoters may be different. The 
promoters may be viral, cellular, or synthetic. The promoters may be constitutive 
or inducible. Other types of promoters and regulatory sequences, recognizable to 
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one skilled in the art or as described herein, may also be used in preparing the 
vectors according to this aspect of the invention. 

Any of the vectors containing multiple promoter/activation exon units may 
optionally include one or more selectable marker(s) and/or amplifiable marker(s). 
The selectable and/or amplifiable markers may contain a poly(A) signal. 
Alternatively, the markers may lack a poiy(A) signal. The selectable marker may 
be a positive or negative selectable marker. The selectable marker may contain 
an unpaired splice donor site upstream, within, or downstream of the marker. 
Alternatively, the selectable marker may lack an unpaired splice donor site. The 
selectable marker(s) and/or amplifiable marker(s), when present, may be located 
upstream, among, or downstream of the promoter/activation exon units. The 
selectable and/or amplifiable marker(s) may be located on the vector in any 
orientation relative to the promoter/activation exon units. When the purpose of 
the selectable marker is to trap endogenous genes, the selectable marker is 
preferably oriented in the same direction as the promoter/activation exons. 

Amplifiable Markers 

Any of the vectors described herein may also optionally comprise one or 
more (e.g., two, three, four, five, or more) amplifiable markers. Examples of 
amplifiable markers include those described in detail hereinabove. Preferably, the 
amplifiable marker(s) are located upstream of the positive/negative selectable 
marker(s). When using polyadenylation trap vectors, it may be advantageous to 
omit a polyadenylation signal from the amplifiable marker(s) to eliminate the 
possibility of capturing a vector-encoded poly(A) signal derived from vector 
concatemerization prior to integration. 

When present, the amplifiable marker(s) may be located upstream of the 
activation transcriptional regulatory sequence (i.e. the promoter responsible for 
directing transcription from the vector through the endogenous gene). The 
amplifiable marker(s) may be present on the vector in any orientation (i.e. the open 
reading frame may be present on either DNA strand). 
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It is also understood that the amplifiable marker(s) can also be the same 
gene as the positive selectable marker. Examples of genes that can be used both 
as positive selectable markers and amplifiable markers include dihydrofolate 
reductase, adenosine deaminase (ada), dihydro-orotase, glutamine synthase (GS), 
5 and carbamyl phosphate synthase (CAD). 

In some embodiments and for certain applications, it may be desirable to 
place multiple amplifiable markers on the vector. Use of more than one 
amplifiable marker allows dual selection, or alternatively sequential selection, for 
each amplifiable marker. This facilitates the isolation of cells that have amplified 
10 the vector and flanking genomic locus, including the gene of interest. 



Promoters 

It is understood that any promoter and regulatory element may be used on 
these activation vectors to drive expression of the selectable marker, amplifiable 
marker (if present), and/or the endogenous gene. In additional preferred 

1 5 embodiments, the promoter driving expression of the endogenous gene is a strong 

promoter. The CMV immediate early gene promoter, S V40 T antigen promoter, 
and P-actin promoter are examples of this type of promoter. In another preferred 
embodiment, an inducible promoter is used to drive expression of the endogenous 
genes. This allows endogenous proteins to be expressed in a more controlled 

20 fashion. The Tetracycline inducible promoter, heat shock promoter, ectdysone 

promoter, and metallothionein promoter are examples of this type of promoter. 
In yet another embodiment, a tissue specific promoter is used to drive expression 
of endogenous genes. Examples of tissue specific promoters include, but are not 
limited to, immunoglobulin promoters, casein promoter, and growth hormone 

25 promoter. 

Restriction Sites 

The vectors of the invention can contain one or more restriction sites 
located downstream of the unpaired splice donor site in the vector. These 
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restriction sites can be used to linearize plasmid vectors prior to transfection. In 
the linear configuration, the activation vector contains, from 5 ' to 3 ' relative to the 
transcribed strand, a promoter, a splice donor site, and a linearization site. 

A restriction site(s) may also be included in the vector intron to facilitate 
removal of vector intron-containing cDNA molecules. In this embodiment, the 
vector contains, from 5' to 3 3 relative to the transcribed strand, a promoter, a 
splice donor site, a restriction site, and a linearization site. By including a 
restriction site between the unpaired splice donor site and the linearization site, 
unspliced transcripts can be removed by digestion of cDNA with the appropriate 
restriction enzyme. cDNA molecules derived from gene activation have removed 
the vector intron containing the restriction site, and therefore, will not be digested. 
This allows gene activated transcripts to be preferentially enriched during 
amplification/cloning, and greatly facilitates identification and analysis of 
endogenous genes. 

A restriction site(s) may also be included in the vector exon to facilitate 
cloning of activated genes. Following gene activation, mRNA is recovered from 
cells and synthesized into cDNA. By digesting the cDNA with a restriction 
enzyme that cuts in the vector exon, gene activated cDNA molecules will contain 
an appropriate overhang at the 5' end for subsequent cloning into a suitable 
vector. This facilitates isolation of gene activated cDNA molecules. 

In one embodiment, the restriction site located in the vector exon is 
different than the restriction site(s) located in the vector intron. This facilitates 
removal of cDN A molecules that contain a vector intron since the digested cDN A 
fragments from vector intron containing transcripts can be designed to have an 
overhang that is incompatible with the cloning vector (see below). Alternatively, 
degenerate restriction sites recognized by the same enzyme may be located in the 
vector exon and intron. Enzymes that cleave these sites are capable of cleaving 
multiple sites, sites with an odd number of bases in the recognition sequence, sites 
with interrupted palindromes, nonpalindromic sequences, or sites containing one 
or more degenerate bases. In other words, restriction sites recognized by the same 
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restriction endonuclease may be used if the enzyme produces an overhang in the 
vector exon that is different from the overhang produced in the vector intron. 
Since different overhangs are produced, a cloning vector containing a site that is 
compatible with the vector exon overhang, and incompatible with the vector intron 

5 overhang may be used to preferentially clone vector exon containing and vector 

intron lacking cDNA molecules. Examples of useful degenerate restriction sites 
include DNA sequences recognized by Sfi I, Acci, Afl III, Sapl, Pie I, Tsp45 I, 
ScrF I, Tse I, PpuM I, Rsr II, and SgrA I. 

The restriction site(s) located in the vector intron and/or exon can be a rare 

10 restriction site (e.g. an 8 bp restriction site) or an ultra-rare site (e.g. a site 

recognized by intron encoded nucleases). Examples of restriction enzymes with 
8 bp recognitions sites include Notl, Sfil, Pad, Ascl, Fsel, Pmel, SgfL, SrfL, Sbjl, 
Sse 8387 I, and Swal. Examples of intron encoded restriction enzymes include 
• l-Ppol, l-Scel, l-Ceul, ?1-Pspl, and PI- 77/1. Alternatively, restriction sites smaller 

15 than 8 bp can be placed on the vector. For example, restriction sites composed 

of 7 bp, 6 bp, 5 bp, or 4 bp can be used. In general, the use of smaller the 
restriction recognition sites will lead to the cloning of less than full-length genes. 
In some cases, such as creation of hybridization probes, isolation of smaller cDNA 
clones may be advantageous. 



20 Bidirectional Activation Vectors 

The activation vectors described herein can also be bidirectional. When 
a single activation transcriptional regulatory sequence is present on the vector, 
gene activation occurs only when the vector integrates into an appropriate location 
(e.g. upstream of the gene) and in the correct orientation. That is, in order to 
25 activate an endogenous gene, the promoter on the activation construct must face 

the endogenous gene allowing transcription of the coding strand. As a result of 
this directionality requirement, only half of the integration events into a locus may 
result in the transcriptional activation of an endogenous gene. The other half of 
integration events result in the vector transcribing away from a gene of interest. 
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Therefore, to increase the gene activation frequency by a factor of two, the 
present invention provides bidirectional vectors that may be used to activate an 
endogenous gene regardless of the orientation in which the vector integrates into 
the host cell genome. 

5 A bidirectional vector according to this aspect of the invention preferably 

comprises two transcriptional regulatory sequences (which may be any 
transcriptional regulatory sequences, including but not limited to the promoters, 
enhancers, and repressors described herein, and which preferably are promoters 
or enhancers, and most preferably promoters), two splice donor sites, and a 

10 linearization site. When a splice donor site is useful, each transcriptional 

regulatory sequence is operably linked to a separate splice donor site, and the 
transcriptional regulatory sequence/splice donor pairs may be in inverse 
orientation relative to each other (i.e. , the first transcriptional regulatory sequence 
may be integrated into the host cell genome in an orientation that is inverse 

1 5 relative to the orientation in which the second transcriptional regulatory sequence 

has integrated into the host cell genome). The two opposing transcriptional 
regulatory sequence/splice donor sites can be separated by the linearization site. 
The function of the linearization site is to produce free DNA ends between the 
transcriptional regulatory sequence/splice donor sites (i. e. in a location suitable for 

20 activation of endogenous genes). Examples of bidirectional vectors of the 

invention are shown in Fig. 1 1 A-l 1C. 

The two opposing transcriptional regulatory sequences may be the same 
transcriptional regulatory sequences or different transcriptional regulatory 
sequences. Optionally, a translational start codon (e.g. ATG) and one or more 

25 additional codons may be included on either or both vector encoded exons. When 

a translational start codon is present, either or both vector exons may encode a 
protein, a portion of a protein, a signal secretion sequence, a portion of a signal 
secretion sequence, a protein motif, or an epitope tag. Alternatively, either or 
both vector exons may lack a translational start codon. 
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• The bidirectional vectors according to this aspect of the invention may 
optionally include one or more selectable markers and one or more amplifiable 
markers, including those selectable markers and amplifiable markers described in 
detail herein. The bidirectional vectors may also be configured as poly(A) trap, 
splice acceptor trap, or dual poly(A)/splice acceptor trap vectors, as described 
above. Other vector configurations described for unidirectional vectors may also 
be incorporated into bidirectional vectors. 

Co-transfection of Genomic Dna with Non-targeted Activation Vectors 

It is recognized that any of the vectors described herein can be integrated 
into, or otherwise combined with, genomic DNA prior to transfection into a 
eukaryotic host cell. This permits high level expression from virtually any gene 
in the genome, regardless of the normal expression characteristics of the gene. 
Thus, the vectors of the invention can be used to activate expression from genes 
encoded by isolated genomic DNA fragments. To accomplish this, the vector is 
integrated into, or otherwise combined with, genomic DNA containing at least one 
gene, or portion of a gene. Typically, the activation vector must be positioned 
within or upstream of a gene in order to activate gene expression. Once inserted 
(or joined), the downstream gene may be expressed (as a transcript or a protein) 
by introducing the vector/genomic DNA into an appropriate eukaryotic host cell. 
Following introduction into the host cell, the vector encoded promoter drives 
expression through the gene encoded in the isolated DNA, and following splicing, 
produces a mature mRNA molecule. Using appropriate activation vectors, this 
process allows protein to be expressed from any gene encoded by the transfected 
genomic DNA. In addition, using the methods described herein, cDNA molecules, 
corresponding to genes encoded by the transfected genomic DNA can be 
generated and isolated. 

To achieve stable expression of the activated gene, the transfected 
activation vector/genomic DNA can be integrated into the host cell genome. 
Alternatively, the transfected activation vector/genomic DNA can be maintained 
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as a stable episome (e.g. using a viral origin of replication and/or nuclear retention 
function - see below). In yet another embodiment, the activated gene may be 
expressed transiently, for example, from a plasmid. 

As used herein, the term "genomic DNA" refers to the unspliced genetic 
material from a cell. Splicing refers to the process of removing introns from genes 
following transcription. Thus, genomic DNA, in contrast to mRNA and cDNA, 
contains exons and introns in an unspliced form. In the present invention, genomic 
DNA derived from eukaryotic cells is particularly useful since most eukaryotic 
genes contain exons and introns, and since many of the vectors of the present 
invention are designed to activate genes encoded in the genomic DNA by splicing 
to the first downstream exon, and removing intervening introns. 

Genomic DNA useful in the present invention may be isolated using any 
method known in the art. A number of methods for isolating high molecular 
weight genomic DNA and ultra-high molecular weight genomic DNA (intact and 
encased in agarose plugs) have been described (Sambrook et al. ; Molecular 
Cloning, Cold Spring Harbor Laboratory Press, (1989)). In addition, commercial 
kits for isolating genomic DNA of various sizes are also available (Gibco/BRL, 
Stratagene, Clontech, etc.). 

The genomic DNA used in the invention may encompass the entire genome 
of an organism. Alternatively, the genomic DNA may include only a portion of 
the entire genome from an organism. For example, the genomic DNA may 
contain multiple chromosomes, a single chromosome, a portion of a chromosome, 
a genetic locus, a single gene, or a portion of a gene. 

Genomic DNA useful in the invention may be substantially intact (i.e. 
unfragmented) prior to introduction into a host cell. Alternatively, the genomic 
DNA may be fragmented prior to introduction into a host cell. This can be 
accomplished by, for example, mechanical shearing, nuclease treatment, chemical 
treament, irradiation, or other methods known in the art. When the genomic DNA 
is fragmented, the fragmentation conditions may be adjusted to produce DNA 
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fragments of any desirable size. Typically, DNA fragments should be large 
enough to contain at least one gene, or a portion of a gene (e.g. at least one exon). 
The genomic DNA may be introduced directly into an appropriate eukaryotic host 
cell without prior cloning. Alternatively, the genomic DNA (or genomic DNA 
fragments) may be cloned into a vector prior to transfection. Useful vectors 
include, but are not limited to, high and intermediate copy number plasmids (e.g. 
pUC, pBluescript, P ACYC184, pBR322, etc.), cosmids, bacterial artificial 
chromosomes (BAC's), yeast artificial chromosomes (YAC's), PI artificial 
chromosomes (PAC's), and phage(e.g. lambda, M13, etc.). Other cloning vectors 
known in the art may also be used. When genomic DNA has been cloned into a 
cloning vector, specific cloned DNA fragments may be isolated and used in the 
present invention. For example, YAC, BAC, PAC, or cosmid libranes can be 
screened by hybridization to identify clones that map to specific chromosomal 
regions. Optionally, once isolated, these clones can be ordered to produce a 
contig through the chromosomal region of interest. To rapidly isolate cDNA 
copies of the genes present in this contig, these genomic clones may be 
transfected, separately or en masse, with the activation vector into a host cell. 
cDNA containing a vector encoded exon, and lacking a vector encoded intron, can 
then be isolated and analyzed. Thus, since all genes present in a contig can be 
rapidly isolated as cDNA clones, this approach greatly enhances the speed of 
positional cloning approaches. 

Any activation vector described herein, including derivatives recognized 
by those skilled in the art, may be co-transfected with genomic DNA, and 
therefore, are useful in the present invention. In its simplest form, the vector can 
contain a promoter operably linked to an exon followed by an unpaired splice 
donor site. Examples of other useful vectors include, but are not limited to, poly 
A trap vectors (e.g. vectors illustrated in Figures 8, 9, 11C, 12F, and 17), dual 
poly (A)/S P lice acceptor trap vectors (e.g. vectors illustrated in Figures 9, 10, 
12G, 19, and 21), bi-directional vectors (e.g. vectors illustrated in Figure 11), 
single exon trap vectors (e.g. the vector illustrated in Figure 19), multi- 
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promoter/activation exon vectors (e.g. the vector illustrated inFigure23), vectors 
for isolating cDNA's corresponding to activated genes, and vectors for activating 
protein expression from activated genes (e.g. vectors illustrated in Figures 2, 3, 
4, 8B-F, 9B-C, 9E-F, 10B-C, 10E-F, 11, 12, 17B-G, and 23). 

The activation vector may also contain a viral origin of replication. The 
presence of a viral origin of replication allows vectors containing genomic 
fragments to be propagated as an episome in the host cell Examples of useful 
viral origins of replication include ori P (Epstein Barr Virus), SV40 ori, BPV ori, 
and vaccinia ori. To facilitate replication from these origins, the appropriate viral 
replication proteins may be expressed from the vector. For example, EBV ori P 
and SV40 ori containing vectors may also encode and express EBNA-1 or T 
antigen, respectively. Alternatively, the vectors may be introduced into cells that 
are already expressing the viral replication protein (e.g. EBNA-1 or T antigen). 
Examples of cells expressing EBNA-1 and T antigen include human 293 cells 
transfected with an EBNA-1 expression unit (Clontech) and COS-7 cells 
(American Type Culture Collection, ATCC No. CRT- 1651), respectively. 

The activation vector may also contain an amplifiable marker. This enables 
cells containing increased copies of the vector and flanking genomic DNA, either 
episomal or integrated in the host cell genome, to be isolated. Cells containing 
increased copies of the vector and flanking genomic DNA express the activated 
gene at higher levels, facilitating gene isolation and protein production. 

The activation vector and genomic DNA may be introduced into any host 
cell capable of splicing from the vector-encoded splice donor site to a splice 
acceptor site encoded by the genomic DNA. In a preferred embodiment, the 
genomic DN A/activation vector are transfected into a host cell from the same 
species as the cell from which the genomic DNA was isolated. In some instances, 
however, it is advantageous to transfect the genomic DNA into a host cell from 
a species that is different from the cell from which the genomic DNA was isolated. 
For example, transfection of genomic DNA from one species into a host cell of a 
second species can facilitate analysis of the genes activated in the transfected 
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genomic DNA using hybridization techniques. Under high stringency 
hybridization, activated genes that were encoded by the transfected DNA can be 
distinguished from genes derived from the host cell. Transfection of genomic 
DNA from one species into a host cell from another species can also be used to 
produce protein in a heterologous cell. This may allow protein to be produced in 
heterologous cells that provide growth, protein modification, or manufacturing 
advantages. 

The activation vector may be co-transfected into a host cell along with 
genomic DNA wherein the vector is not attached to the genomic DNA prior to 
introduction into the cell. In this embodiment, the genomic DNA will become 
fragmented during the transfection process, thereby creating free DNA ends. 
These DNA ends can become joined to the co-transfected activation vector by the 
cell's DNA repair machinery. Following joining to the activation vector, the 
genomic DNA and activation vector can be integrated into the host cell genome 
by the process of non-homologous recombination. If, during this process, a vector 
becomes joined to a gene encoded by the transfected genomic DNA the vector 
will activate its expression. 

Alternatively, the non-targeted activation vector may be physically linked 
to the genomic DNA prior to transfection. In a preferred embodiment, genomic 
DNA fragments are ligated to the vector prior to transfection. This is 
advantageous because it maximizes the probability of the vector becoming 
operably linked to a gene encoded by the genomic DNA, and minimizes the 
probability of the vector integrating into the host cell genome without the 
heterologous genomic DNA. 

In a related embodiment, the genomic DNA may be cloned into the 
activation vector, downstream of the activation exon. In this embodiment, cloning 
of large genomic fragments can be facilitated in vectors capable of accommodating 
large genomic fragments. Thus, the activation vector may be constructed in 
BAC's, YAC's, PAC's, cosmids, or similar vectors capable of propagating large 
fragments of genomic DNA. 
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Another method for joining the activation vector to genomic DNA 
involves transposition. In this embodiment, the activation vector is integrated into 
the genomic DNA by transposition or retroviral integration reactions prior to 
transfectioninto a cell. Accordingly, activation vectors can contain cis sequences 
necessary for facilitating transposition and/or retroviral integration. Examples of 
vectors containing transposon signals are illustrated in figure 27; however, it is 
recognized that any vector described herein may contain transposon signals. 

Any transposition system capable of inserting foreign sequences into 
genomic DNA can be used in the present invention. In addition, transposons 
capable of facilitating inversions and deletions can also be used to practice the 
invention. While deletion and inversion systems do not integrate the activation 
vector into genomic DNA, they do allow the activation vector to change positions 
relative to cloned genomic DNA when the genomic DNA has been cloned into the 
activation vector. Thus, multiple genes within a given genomic fragment can be 
activated by shuffling the activation vector (by integration, inversion, or deletion) 
into multiple positions within, or outside of, the genomic fragment. Examples of 
transposition systems useful for the present invention include, but are not limited 
to 5y, Tn 3, Tn5, Tn7,Tn9, TnlO, Ty, retroviral integration and retro-transposons 
(Berg et al, Mobile DNA, ASM Press, Washington DC, pp. 879-925 (1989); 
Strathman et al., Proc. Natl. Acad Sci. USA 88:1247 (1991); Berg et al., Gene 
113:9 (1992); Liu et al, Nucl. Acids Res. 75:9461 (1987), Martin et al., Proc. 
Natl. Acad Sci. USA 92:8398 (1995); Phadnis et al., Proc. Natl. Acad Sci. USA 
86:5908 (1989); Tomcsanyi et al., J. Bacteriol. 172:6348 (1990); Way et al, Gene 
52:369 (1984); Bainton et al., Cell 65:805 (1991); Ahmed et al, J. Mol. Biol. 
178:941 (1984); Benjamin et al., Cell 59:373 (1989); Brown et al., Cell 49:341 
(1987); Eichinger et al., Cell 54:955 (1988); Eichinger et al., Genes Dev. 4:324 
(1990); Braitermanet al.,M?/. Cell. Biol. 14:51X9 (1994); Braiterman et al.,Mo/. 
Cell Biol. 14:513 1 (1994); York et al., Nucl. Acids Res. 26:1921 (1998); Devme 
et al., Nucl. Acids Res. 18:3165 (1994); Goryshin et al., J. Biol. Chem. 273:1361 
(1998). 
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Using transposition, an activation vector may be integrated into any form 
of genomic DNA. For example, the activation vector may be integrated into either 
intact or fragmented genomic DNA. Alternatively, the activation vector may be 
integrated into a cloned fragment of genomic DNA (Figure 28). In this 
5 embodiment, the genomic DNA may reside in any cloning vector, including high 

and intermediate copy number plasmids (e.g. pUC, pBluescript, pACYC184, 
pBR322, etc.), cosmids, bacterial artificial chromosomes (B AG's), yeast artificial 
chromosomes (YAC's), PI artificial chromosomes (PAC's), and phage (e.g. 
lambda, M13 , etc.). Other cloning vectors known in the art may also be used. As 
1 0 described above, genomic fragments from specific genetic loci may be isolated an 

used as a substrate for activation vector integration. 

Following integration of the activation vector, the genomic DNA may be 
introduced directly into a suitable host cell for expression of the activated gene. 
: Alternatively, the genomic DNA may be introduced into and propagated in an 
1 5 intermediate host cell. For example, following integration of an activation vector 

into a B AC genomic library, the B AC library can be transformed into E. coli. This 
allows plasmids containing the transposon to be enriched by selecting for an 
antibiotic resistance marker residing on the activation vector. As a result, BAC 
plasmids lacking an integrated activation vector will be removed by antibiotic 
20 selection. 

The transp osition mediated activation vector integration may occur in vitro 
using purified enzymes. Alternatively, the transposition reaction may occur 
in vivo. For example, transposition may be carried out in bacteria, using a donor 
strain carrying the transposon either on a vector or as integrated copies in the 
25 genome. A target of interest is introduced into the transposer host where it 

receives integrations. Targets bearing insertions are then recovered from the host 
by genetic selection. Similarly, eukaryotic host cells, such as yeast, plant, insect, 
or mammalian cells, can be used to carry out the transposon mediated integration 
of an activation vector into a fragment of genomic DNA. 
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Isolation of mRNA and cDNA Produced from Activated Endogenous Genes 
In additional embodiments, the present invention is directed to methods for 
isolating genes, particularly genes contained within the genome of a eukaryotic 
cell, that are activated using the vectors of the invention. These methods exploit 
the structure of the mRNA molecules produced using the non-targeted gene 
activation vectors of the invention. The methods of the invention described herein 
allow virtually any activated gene to be isolated, regardless of whether it has been 
previously isolated and characterized, and regardless of whether it has a known 
biological activity. This is made possible by the nature of the chimeric transcripts 
produced from the integrated vectors of of the present invention. Using methods 
described herein, activation vectors can be integrated into the genome of a cell. 
Typically, the activation vectors, however, are integrated into the genome of many 
cells to produce a library of unique integration events. Each member of the library 
contains the vector located at a unique integration site(s), and potentially contains 
an activated endogenous gene. Gene activation occurs when the activation vector 
integrates upstream of the 3 '-most exon of an endogenous gene and in an 
orientation capable of allowing transcription from the vector to proceed through 
the endogenous gene. The integration site may be in an intron or exon of the 
endogenous gene, or may be upstream of the transcription start site of the gene. 
Following integration, the activation constructs are designed to produce a 
transcript capable of splicing from an exon encoded by the activation vector to an 
exon encoded by the endogenous gene. As a result, a chimeric message is 
produced that contains the vector exon linked to the exons from an endogenous 
gene, wherein the endogenous exons are derived from the region located 
downstream of the vector integration site. The structure of this chimeric transcript 
can be exploited for gene discovery purposes. For example, the chimeric 
transcripts can be rapidly isolated to use as probes (to isolate the full length cDNA 
or genomic copy of the gene or to characterize the gene) or for direct sequencing 
and/or characterization. 
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To isolate the chimeric transcripts activated by vector insertion, cDNA is 
produced from a library member containing the activation event. It is also possible 
to isolate chimeric transcripts from pools of library members in order to increase 
the through-put of the procedure. cDNA can then be produced from the mRNA 

5 harvested from the activated cells. Alternatively, total RNA may be used to 

produce cDNA. In either case, first strand synthesis can be carried out using an 
oligo dT primer, an oligo dT/poly(A) signal primer, or a random primer. To 
facilitate cloning of the cDNA product, a poly dT based primer can be used with 
the structure: 5'-Primer X(dT) M00 -3\ The oligo dT/poly(A) signal primer can 

1 0 have the structure 5 ' -(dT) 10 _ 30 -Primer X-N 0 . 6 -TTT ATT -3 ' . The random primer 

can have the structure: 5 '-(Primer X)NNNNNN -3 ' . In each primer, Primer X is 
any sequence that can be used to subsequently PCR amplify target nucleic acid 
molecules. Where the activated gene amplification product is to be cloned, it is 
useful to include one or more restriction sites within the primer X sequence to 

1 5 facilitate subsequent cloning. Other primers recognized by those skilled in the art 

can be used to create first strand cDNA products, including primers that lack a 
Primer X region. 

In accordance with the invention, the primers may be conjugated with one 
or more hapten molecules to facilitate subsequent isolation of nucleic acid 

20 molecules {e.g., first and/or second strand cDNA products) comprising such 

primers. After the primer becomes associated with the nucleic acid molecule (via 
incorporation during cDNA synthesis), selective isolation of the molecule 
containing the haptenylated primer may be accomplished using a corresponding 
ligand which specifically interacts with and binds to the hapten via ligand-hapten 

25 interactions. In preferred such aspects, the ligand may be bound to, for example, 

a solid support. Once bound to the solid support, the molecules of interest 
(haptenylated primer-containing nucleic acid molecules) can be separated from 
contaminating nucleic acids and other materials by washing the support matrix 
with a solution, preferably a buffer or water. Cleavage of one or more of the 

30 cleavage sites within the primer, or by treatment of the solid support containing 
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the nucleic acid molecule with a high ionic strength elution buffer, then allows for 
removal of the nucleic acid molecule of interest from the solid support. 

Preferred solid supports for use in this aspect of the invention include, but 
are not limited to, nitrocellulose, diazocellulose, glass, polystyrene, 
polyvinylchloride, polypropylene, polyethylene, dextran, Sepharose, agar, starch, 
nylon, latex beads, magnetic beads, paramagnetic beads, superparamagnetic beads 
or microtitre plates and most preferably a magnetic bead, a paramagnetic bead or 
a superparamagnetic bead, that comprises one or more ligand molecules 
specifically recognizing and binding to the hapten molecule on the primer. 

Particularly preferred hapten molecules for use on the primer molecules of 
the invention, include without limitation: (i) biotin; (ii) an antibody; (iii) an 
enzyme; (iv) lipopolysaccharide; (v) apotransferrin; (vi) ferrotransferrin; (vii) 
insulin; (viii) cytokines (growth factors, interleukins or colony-stimulating factors); 
(ix) g P 120; (x) P-actin; (xi) LFA-1; (xii) Mac-1; (xiii) glycophorin; (xiv) laminin; 
(xv) collagen; (xvi) fibronectin; (xvii) vitronectin; (xviii) integrins ccj, and a v P 3 ; 
(xix) integrins a 3 p\, a£ u a 4 P 7 , cc 5 Pi, «vPi> a ^ and (xx) irUegnnS 

« t P 1; « 2 p i5 a 3 p\ and a v P 3 ; (xxi) integrins oc 2 P I; a 3 P 1 , a 6 p„ a 7 p 1 and a 6 P 5 ; 
(xxii) ankyrin; (xxiii) C3bi, fibrinogen or Factor X; (xxiv) ICAM-1 or ICAM-2; 
(xxv) spectrin or fodrin; (xxvi) CD4; (xxvii) a cytokine (e.g., growth factor, 
interleukin or colony-stimulating factor) receptor; (xxviii) an insulin receptor; 
(xxix) a transferrin receptor; (xxx) Fe +++ ; (xxxi) polymyxin B or endotoxin- 
neutralizing protein (ENP); (xxxii) an enzyme-specific substrate; (xxxiii) protein 
A, protein G, a cell-surface Fc receptor or an antibody-specific antigen; and 
(xxxiv) avidin and streptavidin. Particularly preferred is biotin. 

Particularly preferred ligand molecules according to this aspect of the 
invention, which correspond in order to the above-described hapten molecules, 
include without limitation: (i) avidin and streptavidin; (ii) protein A, protein G, a 
cell-surface Fc receptor or an antibody- specific antigen; (iii) an enzyme-specific 
substrate; (iv) polymyxin B or endotoxin-neutralizing protein (ENP); (v) Fe +++ ; 
(vi) a transferrin receptor; (vii) an insulin receptor; (viii) a cytokine (e.g., growth 
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factor, interleukin or colony-stimulating factor) receptor; (be) CD4; (x) spectrin 
or fodrin; (xi) ICAM-1 or ICAM-2; (xii) C3bi, fibrinogen or Factor X; (xiii) 
ankyrin; (xiv) integrins a$ u a£ b a£ to cc^, cc 7 p I and cc 6 P 5 ; (xv) integrins afo, 
<x 2 p i5 a 3 p x and c^V, (xvi) integrins a£ u a«Pi, «*Pt- «sPi> «vPi» a ^ «vP 3 and 
a v p 6 ; (xvii) integrins a v P x and o^; (xviii) vitronectin; (rix) fibronectirv, (xx) 
collagen; (xxi) laminin; (xxii) glycophorin; (xxiii) Mac-1; (xxiv) LFA-1; (xxv) P- 
actin; (xxvi) gpl20; (xxvii) cytokines (growth factors, interieukins or colony- 
stimulating factors); (xxviii) insulin; (xxix) ferrotransferrin; (xxx) apotransferrin; 
(xxxi) lipopolysaccharide; (xxxii) an enzyme; (xxxiii) an antibody; and (xxxiv) 
biotin. Particularly preferred, for use with biotinylated primers of the invention, 
are avidin and streptavidin. 

Following first strand synthesis, second strand cDNA synthesis may be 
carried out using a primer specific for the vector encoded exon. This creates 
double stranded cDNA from all transcripts that were derived from the vector 
encoded promoter. All cellular mRNA (and cDNA) produced from endogenous 
promoters remains single stranded since the transcript lacks a vector exon at it 5' 
end. Once second strand synthesis is carried out, the cDNA may be digested with 
a restriction enzyme, cloned into a vector, and propagated. 

To facilitate cloning, cDNA molecules containing the vector exon are 
amplified by PCR using a primer specific for the vector exon and a primer specific 
for the first strand cDNA primer (e.g. Primer X). PCR amplification results in the 
production of variable length DNA fragments representing different locations of 
priming during first strand synthesis and/or amplification of multiple chimeric 
transcripts from different genes. These amplification products can be cloned into 
plasmids for characterization, or can be labeled and used as a probe. 

Other amplification techniques, such as linear amplification using RNA 
polymerase (Van Gelder, Proc. Natl. Acad. Set USA £7:1663-1667 (1990); 
Eberwine, Methods 70:283-288 (1996)), can be used. For example, when linear 
amplification by RNA polymerase is used, a promoter (e.g. T7 promoter) can be 
placed on the vector exon. As a result, gene activated transcripts will contain the 



-114- 



promoter sequence at the 5' end of the transcript. Alternatively, a promoter can 
be ligated onto the cDNA molecule following first strand and second strand 
synthesis. Using either strategy, RNA polymerase is then incubated with cDNA 
in the presence of ribonucleotide triphosphates to create RNA transcripts from the 
cDNA. These transcripts are then reverse transcribed to produce cDNA. Since 
RNA polymerase can create several thousand transcripts from a single cDNA 
molecule, and since each of these transcripts can be reverse transcribed into 
cDNA, a large amplification can be achieved. As with PCR, amplification with 
RNA polymerase can facilitate cloning of activated genes. Other types of 
amplification strategies are also possible. 

In another embodiment, the vector exon containing cDNA molecules are 
isolated without amplification. This may be useful in instances where biases occur 
during amplification (for example, when one DNA fragment amplifies more 
efficiently than another). To produce cDNA enriched for tagged messages, RNA 
is isolated from the activation library. A primer (e.g. a random hexamer, 
oligo(dT), or hybrid primers containing a primer linked to poly(dT) or a random 
nucleotides) is annealed to the RNA and used to direct first strand synthesis. The 
first strand cDNA molecules are then hybridized to a primer specific for the vector 
encoded exon. This primer directs second strand synthesis. Following second 
strand synthesis, the cDNA may be digested with restriction enzymes that cut in 
the vector exon and in the first strand primer (e.g. in Primer X - see above). The 
second strand products may then be cloned into a useful vector to allow them to 
be propagated. 

It will be apparent to one of ordinary skill in view of the description 
contained herein that the cDNA products made according to the methods of the 
invention may also be cloned into a cloning vector suitable for transfection or 
transformation of a variety of prokaryotic (bacterial) or eukaryotic (yeast, plant 
or animal including human and other mammalian) cells. Such cloning vectors, 
which may be expression vectors, include but are not limited to chromosomal-, 
episomal- and virus-derived vectors, e.g., vectors derived from bacterial plasrmds 
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or bacteriophages, and vectors derived from combinations thereof, such as 
cosmids and phagemids, BACs, MACs, YACs, and the like. Other vectors 
suitable for use in accordance with this aspect of the invention, and methods for 
insertion of DNA fragments therein and transformation of host cells with such 
5 cloning vectors, will be familiar to those of ordinary skill in the art. 

Removal of Unspliced Transcription Products 

In some instances, the activation vector will integrate into the genome in 
a region lacking genes. Alternatively, it may integrate into a region containing a 
gene(s), but be oriented in a manner that results in the transcription of the non- 
1 0 coding strand. In each of these instances, gene activated transcripts are produced 

that contain normally untranscribed DNA sequences next to the vector encoded 
exon. These sequences would complicate identification and analysis of novel 
genes. Therefore, it would be advantageous to selectively remove these genomic 
molecules. 

15 To remove cDNA molecules that contain a vector encoded intron, the 

double strand cDNA is treated with a restriction enzyme that recognizes a 
sequence located in the vector encoded intron. Preferably, the restriction enzyme 
creates an overhang that is different from the overhang produced by cleavage of 
the vector exon. This ensures the cloning of only activated genes by preventing 

20 the cleavage products from ligating into the cloning vector. 

Recovery of Exon I from activated endogenous genes 

To recover exon I from activated genes, specialized vectors can be used 
to create non-targeted gene activation libraries. In its simplest form, this vector 
contains, from 5' to 3', a promoter, an unpaired splice donor site, and a second 
25 promoter. The downstream promoter is oriented in the same direction as the 

upstream promoter. Upon integration upstream of an endogenous gene, this type 
of vector produces two types of transcripts. The first transcript contains the 
vector exon joined to exon II of the endogenous gene. Methods for isolating this 
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transcript are described above. The second transcript contains the upstream 
region of the endogenous gene followed by exon I joined to exon II and other 
downstream exons from the endogenous gene (Figure 6). 

Using a two step process, exon I can be recovered from cells containing 
the integrated vector. First, vector exon containing transcripts (i.e. Transcript 
type #1, Figure 13) are isolated using the methods described above. Once 
isolated, the 5' end of the transcript including exon II can be sequenced to 
determine the sequence of the flanking endogenous exons. Second, once the 
sequence of the flanking endogenous exons is known, PCR primers capable of 
annealing to exon II (or a downstream exon) of the activated gene can be 
developed. These primers can be used to amplify exon I from Transcript #2 
(Figure 13) using a modified form of inverse PCR (Zeiner, M., Biotechniques 
17(6) : 105 1-1053 (1994)). Briefly, amplification of exon I from the endogenous 
gene is achieved by carrying out first strand cDNA synthesis with a gene specific 
primer, based on the sequence information determined above. Second strand 
synthesis can be carried out using E. coli DNA polymerase I under conditions well 
known to those skilled in the art. The double strand cDNA is then digested with 
a restriction enzyme that cleaves at least once in the endogenous gene upstream 
of the first strand cDNA primer, and that does not cleave in the vector exon. 
Following digestion, the cDNA is self ligated to produce circular molecules. 
Using inverted PCR primers that anneal in the endogenous gene upstream of the 
restriction/circularization site, amplification by PCR produces a DNA product 
containing exon I sequences from the endogenous gene. 

Method for Selecting Cells Containing Higher Levels of Gene Activated 
Transcripts/Protein 

In several embodiments of the disclosed invention, the activation vector 
contains an amplifiable marker (e.g. DHFR) and a viral origin of replication (e.g. 
EBV on P). In other embodiments, an amplifiable marker and viral origin of 
replication are present on a cloning vector containing a cloned fragment of 
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genomic DNA. In yet another embodiment, the activation vector contains one 
element (e.g. DHFR) and a cloning vector carrying a genomic insert contains the 
other element (e.g. Ori P). Regardless of the initial location of the amplifiable 
marker and viral origin, the elements are combined on the same DNA molecule 
prior to or during introduction into a host cell. 

In addition to the cis-acting elements, a trans-acting viral protein is 
generally required for efficient replication of the episomes. Examples of 
trans-acting viral proteins include EBNA-1 and SV40 T antigen. To promote 
efficient replication of episomes, the trans-acting viral protein can be expressed 
from the episome. Thus, the viral trans-acting protein may be expressed from the 
transposing activation vector, or may be positioned on the backbone of the cloning 
vector. Alternatively, the trans-acting viral protein may be expressed by the 
eukaryotic host cells into which the episome is introduced. 

Once the amplifiable marker and viral origin of replication are on the same 
molecule and present in a host cell expressing the appropriate viral replication 
protein(s), the copy number of the episome can be increased. To increase the 
copy number of the episome, the cells can be placed under the appropriate 
selection. For example, if DHFR is present on the episome, methotrexate may be 
added to the culture. The selective agent may be applied at relatively high 
concentrations to isolate cells in the population that already have a high episome 
copy number. Alternatively, the selective agent may be applied at lower 
concentrations, and periodically increased in concentration. Two-fold increases 
in drug concentration will result in step-wise increases in copy number. 

To reduce the frequency of non-specific drug resistance (i.e. drug 
resistance that is not associated with increased copy number of the episome), more 
than one amplifiable marker can be placed on the vector. Inclusion of multiple 
amplifiable markers on the episome allows cells to be selected with multiple drugs 
(either simultaneously or sequentially). Since non-specific drug resistance is a 
relatively rare event, the probability of a cell developing non-specific drug 
resistance to multiple drugs is exceedingly rare. Thus, the presence of multiple 
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amplifiable markers on the episome facilitates isolation of cells that have a high 
episome copy number. 

Amplification of episome copy number increases the number of transcripts 
derived from the vector activated gene. This, in turn, facilitates isolation of cDN A 
5 molecules derived from the activated gene. Furthermore, amplification of episome 

copy number can dramatically increase protein expression from the activated gene. 
Higher levels of protein production facilitate generation of proteins for bioassay 
screening, cell assay screening, and manufacturing purposes. 

As a result of the highly desirable characteristics described above, vectors 
10 containing a viral origin of replication and an amplifiable marker, and the use of 

these vectors to rapidly amplify the copy number of episomal vectors, represent 
a break through that extends beyond the scope of activating expression of genes 
present in genomic DNA. For example, these vectors can be used to over-express 
cDNA encoded genes to produce high levels of protein expression without the 
15 need to integrate the gene into a host cell genome with an amplifiable marker. 

Furthermore, like amplification of chromosomal sequences, cell possessing several 
hundred to several thousand episomal copies of the vector can be isolated and 
maintained in culture. Thus, the vectors described herein, and their uses, allow 
high levels of cloned genomic DNA to be propagated in mammalian cells, facilitate 
20 isolation of cDNA copies of genes present on the vector as genomic inserts, and 

maximize protein production from cloned cDNA and genomic copies of 
eukaryotic genes. 

Other suitable modifications and adaptations to the methods and 
applications described herein will be readily apparent to one of ordinary skill in the 
25 relevant arts and may be made without departing from the scope of the invention 

or any embodiment thereof. Having now described the present invention in detail, 
the same will be more clearly understood by reference to the following examples, 
which are included herewith for purposes of illustration only and are not intended 
to be limiting of the invention. 
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EXAMPLES 

Example 1: Transfection of Cells for Activation of Endogenous Gene 
Expression 

Method: Construction of pRIG-1 

5 Human DHFR was amplified by PCR from cDNA produced from HT 1 080 

cells by PCR using the primers DHFR-F1 

(5 ( TCCTTCGAAGCTTGTCATGGTTGGTTCGCTAAACTGCAT 3') 
(SEQ ID NO:l) and DHFR-R1 (5' AAACTT AAGATCG ATTAATC ATTC- 
TTCTCATATACTTCAA 3') (SEQ ID NO:2), and cloned into the T site in 

10 pTARGET™ (Promega) to create pTARGET: DHFR. The RSV promoter was 

isolated from PREP9 by digestion with Nhel and Xbal and inserted into the Nhel 
site of pTARGET:DHFR to create P TgT:RS V+DHFR. Oligonucleotides JH169 
(5 ( ATCCACCATGGCTACAGGTGAGTACTCG 3 ') (SEQ ID NO:3) and JH1 70 
(5' GATCCGAGTACTCACCTGTAGCCATGGTGGATTTAA 3') (SEQ ID 

15 NO:4) were annealed and inserted into the I-Ppo-I and Nhel sites of 

pTgT : RS V+DHFR to create pTgT : RS V+DHFR+Exl . A 279 bp region 
corresponding to nucleotides 230-508 of pBR322 was PCR amplified using 
primers Tet Fl (5' GGCGAGATCTAGCGCTATATGCGTTGATGCAAT 3') 
(SEQ ID NO:5)and Tet F2 (5' GGCCAGATCTGCTACCTTAAGAGAGCCG- 

20 AAACAAGCGCTCATGAGCCCGAA 3') (SEQ ID NO:6). Amplification 

products were digested with Bglll and cloned into the BamHI site of 
P TgT:RSV+RSV+DHFR+Exl to create pRIG-1. 

Transfection - Creation ofpRlG-1 Gene Activation Library in HT1080 Cells 



25 



To activate gene expression, a suitable activation construct is selected 
from the group of constructs described above. The selected activation construct 
is then introduced into cells by any transfection method known in the art. 
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Examples of transfection methods include electroporation, lipofection, calcium 
phosphate precipitation, DEAE dextran, and receptor mediated endocytosis. 
Following introduction into the cells, the DNA is allowed to integrate into the host 
cell's genome via non-homologous recombination. Integration can occur at 
spontaneous chromosome breaks or at artificially induced chromosomal breaks. 

Method: Transfection of human cells with pRIGl. 2x1 0 9 HH1 cells, an 
HPRT subclone of HT1080 cells, was grown in 1 50 mm tissue culture plates to 
90% confluency. Media was removed from the cells and saved as conditioned 
media (see below). Cells were removed from the plate by brief incubation with 
trypsin, added to media/10% fetal bovine serum to neutralize the trypsin, and 
pelleted at 1 000 rpm in a Jouan centrifuge for 5 minutes. Cells were washed in 1 X 
PBS, counted, and repelleted as above. The cell pellet was resuspended at 
2.5 x 10 7 cells/ml final in IX PBS (Gibco BRL Cat #14200-075). Cellswerethen 
exposed to 50 rads of y irradiation from a 137 Cs source. pRIGl (Fig. 14A-14B; 
SEQ ID NO: 18) was linearized with Bamlll, purified with phenol/chloroform, 
precipitated with ethanol, and resuspended in PBS. Purified and linearized 
activation construct was added to the cell suspension to produce a final 
concentration of 40 ug/ml. The DNA/irradiated cell mixture was then mixed and 
400 pi was placed into each 0.4 cm electroporation cuvettes (Biorad). The 
cuvettes were pulsed at 250 Volts, 600 uFarads, 50 Ohms using an 
electroporation apparatus (Biorad). Following the electric pulse, the cells were 
incubated at room temperature for 10 minutes, and then placed into 
ccMEM/10%FBS containing penicillin/ streptomycin (Gibco/BRL). Thecellswere 
then plated at approximately 7 x 10 6 cells/150 mm plate containing 35 ml 
aMEM/10% FBS/penstrep (33% conditioned media/67% freshmedia). Following 
a 24 hour incubation at 37°C, G418 (Gibco/BRL) was added to each plate to a 
final concentration of 500 ug/ml from a 60 mg/ml stock. After 4 days of selection, 
the media was replaced with fresh aMEM/10% FBS/penstrep/500 ug/ml G418. 
The cells were then incubated for another 7-10 days and the culture supernatant 
assayed for the presence of new protein factors or stored at -80 °C for later 
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anaiysis. The drug resistant clones can be stored in liquid nitrogen for later 
analysis. 

Example 2: Use of Ionizing Irradiation to Increase the Frequency and 
Randomness of DNA Integration 

Method: HH1 cells were harvested at 90% confluency, washed in Ix PBS, 
and resuspended at a cell concentration of 7.5 x 10 6 cells/ml in IX PBS. 15 ug 
linearized DNA (pRIG-1) was added to the cells and mixed. 400 ul was added to 
each electroporation cuvette and pulsed at 250 Volts, 600 ixFarads, 50 Ohms 
using an electroporation apparatus (Biorad). Following the electric pulse, the cells 
were incubated at room temperature for 10 minutes, and then placed into 2.5 ml 
aMEM/10%FBS/lX penstrep. 300 ul of cells from each shock were irradiated at 
0, 50, 500, and 5000 rads immediately prior to or at either 1 hour or 4 hours post 
transfection. Immediately following irradiation, the cells were plated onto tissue 
culture plates in complete medium. At 24 hours post plating, G418 was added to 
the culture to a final concentration of 500 ug/ml. At 7 days post-selection, the 
culture medium was replaced with fresh complete medium containing 500 ug/ml 
G418. At 10 days post selection, medium was removed from the plate, the 
colonies were stained with Coomassie Blue/90% methanol/ 10% acetic acid and 
colonies with greater than 50 cells were counted. 

Example 3 : Use of Restriction Enzymes to Generate Random, Semi-random, 
or Targeted Breaks in the Genome 

Method: HHI cells were harvested at 90% confluence, washed inlxPBS, 
and resuspended at a cell concentration of 7.5 x 10* 5 cells/ml in IX PBS. To test 
the efficiency of integration, 15 ug linearized DNA (PGK-Pgeo) was added to 
each 400 pi aliquot of cells and mixed. To several aliquots of cells, restriction 
enzymes Xbal, Noil, Hindlll, Ippol (10-500 units) were then added to separate 
cell/DNA mixture. 400 pi was added to each electroporation cuvette and pulsed 
at 250 Volts, 600 uFarads, 50 Ohms using an electroporation apparatus (BioRad). 
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Following the electric pulse, the cells were incubated at room temperature for 10 
minutes, and then placed into 2.5 ml aMEM/10%FBS/lX penstrep. 300 ul of 2.5 
ml total cells from each shock were plated onto tissue culture plates in complete 
media. At 24 hours post plating, G418 was added to the culture to a final 
5 concentration of 600 ug/ml. At 7 days post-selection, the media was replaced 

with fresh complete media containing 600 ug/ml G41 8. At 10 days post selection, 
media was removed from the plate, the colonies were stained with Coomassie 
Blue/90% methanol/10% acetic acid and colonies with greater than 50 cells were 
counted. 

10 Example 4: Amplification by Selecting for Two Amplifiable Markers 

Located on the Integrated Vector 

Following integration of the vector into the genome of a host cell, the 
genetic locus may be amplified in copy number by simultaneous or sequential 
selection for one or more amplifiable markers located on the integrated vector. For 

1 5 example, a vector comprising two amplifiable markers may be integrated into the 

genome, and expression of a given gene {i.e., a gene located at the site of vector 
integration) can be increased by selecting for both amplifiable markers located on 
the vector. This approach greatly facilitates the isolation of clones of cells that 
have amplified the correct locus {i.e., the locus containing the integrated vector). 

20 Once the vector has been integrated into the genome by nonhomologous 

recombination, individual clones of cells containing the vector integrated in a 
unique location may be isolated from other cells containing the vector integrated 
at other locations in the genome. Alternatively, mixed populations of cells may 
be selected for amplification. 

25 Cells containing the integrated vector are then cultured in the presence of 

a first selective agent that is specific for the first amplifiable marker. This agent 
selects for cells that have amplified the amplifiable marker either on the vector or 
on the endogenous chromosome. These cells are then selected for amplification 
of the second selectable marker by culturing the cells in the presence of a second 
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selective agent that is specific for the second ampiifiable marker. Cells that 
amplified the vector and flanking genomic DNA will survive this second selective 
step, whereas cells that amplified the endogenous first ampiifiable marker or that 
developed non-specific resistance will not survive. Additional selections may be 
performed in similar fashion when vectors containing more than two (e.g. , three, 
four, five, or more) ampiifiable markers are integrated into the cell genome, by 
sequential culturing of the cells in the presence of selective agents that are specific 
for the additional ampiifiable markers contained on the integrated vector. 
Following selection, surviving cells are assayed for level of expression of a desired 
gene, and the cells expressing the highest levels are chosen for further 
amplification. Alternatively, pools of cells resistant to both (if two ampiifiable 
markers are used) or all (if more than two ampiifiable markers are used) of the 
selective agents may be further cultured without isolation of individual clones. 
These cells are then expanded and cultured in the presence of higher 
concentrations of the first selective agent (usually twofold higher). The process 
is repeated until the desired expression level is obtained. 

Alternatively, cells containing the integrated vector may be selected 
simultaneously for both (if two are used) or all (if more than two are used) of the 
ampiifiable markers. Simultaneous selection is accomplished by incorporating both 
selection agents (if two markers are used) or all of the selection agents (if more 
than two markers are used) into the selection medium in which the transfected 
cells are cultured. The majority of surviving cells will have amplified the 
integrated vector. These clones can then be screened individually to identify the 
cells with the highest expression level, or they can be carried as a pool. A higher 
concentration of each selective agent (usually twofold higher) is then applied to 
the cells. Surviving cells are then assayed for expression levels. This process is 
repeated until the desired expression levels are obtained. 

By either selection strategy (/. e. , simultaneous or sequential selection), the 
initial concentration of selective agent is determined independently by titrating the 
agent from low concentrations with no cytotoxicity to high concentrations that 



-124- 



result in cell death in the majority of cells. In general, a concentration that gives 
rise to discrete colonies (e.g., several hundred colonies per 100,000 cells plated) 
is chosen as the initial concentration. 



Example 5: Isolation of cDNAs Encoding Transmembrane Proteins 

P RIG8R1-CD2 (Fig. 5 A-5D; SEQ ZD NO:7), pRIG8R2-CD2 (Fig. 6A-6C; 
SEQ ID NO:8), and pRIG8R3-CD2 (Fig. 7A-7Q SEQ ID NO:9) vectors contain 
the CMV immediate early gene promoter operably linked to an exon followed by 
an unpaired splice donor site. The exon on the vector encodes a signal peptide 
linked to the extra-cellular domain of CD2 (lacking an in frame stop codon). Each 
vector encodes CD2 in a different reading frame relative to the splice donor site. 

To create a library of activated genes, 2 x 10 7 cells were irradiated with 
50 rads from a I37 Cs source and electroporated with 15 ug of linearized 
pRIG8Rl-CD2 (SEQ ID NO:7). Separately, this was repeated with 
pRIG8R2-CD2 (SEQ ID NO;8), and again with pRIG8R3-CD2 (SEQ ID NO:9). 
Following transfection, the three groups of cells were combined and plated into 
150 mm dishes at 5 x 10 6 transfected cells per dish to create library #1. At 24 
hours post transfection, library # 1 was placed under 500 ug/ml G4 1 8 selection for 
14 days. Drug resistant clones containing the vector integrated into the host cell 
genome were combined, aliquoted, and frozen for analysis. Library #2 was 
created as described above, except that 3 x 10 7 cells, 3 x 10 7 cells and 1 x 10 7 cells 
were transfected with pRIG8Rl-CD2, pRIG8R2-CD2, and pRIG8R3-CD2, 
respectively. 

To isolate cells containing activated genes encoding integral membrane 
proteins, 3 x 10 6 cells from each library were cultured and treated as follows: 



25 



Cells were trypsinized using 4 mis of Trypsin- EDTA. 

After the cells had released, the trypsin was neutralized by addition 

of 8 ml of alpha MEM/10% FBS. 
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The cells were washed once with sterile PBS and collected by 
centrifugation at 800 xg for 7 minutes. 

The cell pellet was resuspended in 2ml of alpha MEM/1 0% FBS. 
1 ml was used for sorting while the other 1 ml was replated in 
alpha MEM/10% FBS containing 500 pg/mlG-418, expanded and 
saved. 

The cells used for sorting were washed once with sterile alpha 
MEM/ 10% FBS and collected by centrifugation at 800 x g for 
7 minutes. 

The supernatant was removed and the pellet resuspended in I ml 
of alpha MEM/ 10% FBS. 100 pi of these cells was removed for 
staining with the isotype control. 

200 pi of Anti-CD2 FITC (Pharmingen catalog # 30054X) was 
added to the 900 pi of cells while 20 pi of the Mouse IgQ isotype 
control (Pharmingen catalog # 33814X) was added to the 100 pi 
of cells. The cells were incubated, on ice, for 20 minutes. 
To the tube that contained the cells stained with the Anti-Human 
CD 2 FITC, 5 ml of PBS/1% FBS were added. To the isotope 
control, 900 pi of PBS/1% FBS were added. The cells were 
collected by centrifugation at 600 x g for 6 minutes. 
The supernatant from the tubes was removed. The cells that had 
been stained with the isotype control were resuspended in 500 pi 
of alpha MEM/ 10% FBS, and the cells that had been stained with 
anti-CD 2- FITC were resuspended in 1.5 ml alpha MEM/ 10% 
FBS. 

Cells were sorted through five sequential sorts on a FACS Vantage Flow 
Cytometer (Becton Dickinson Immunocytometry Systems; Mountain View, CA). 
In each sort, the indicated percentage of total cells, representing the most strongly 
fluorescent cells (see below) were collected, expanded, and resorted. HT1080 
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cells were sorted as a negative control. The following populations were sorted 
and collected in each sort: 





Library #1 


Library #2 


Library #3 


Sort#l 


500,000 cells 
collected (top 10%) 


100,000 cells 
collected (top 10%) 


40,000 cells collected 
(top 10%) 


Sort #2 


300,000 cells 
collected (top 5%) 


220,000 cells 
collected (top 11%) 


14,000 cells collected 
(top 5%) 


Sort #3 


90,000 cells collected 
(top 5%) 


40,000 cells collected 
(top 10%) 


120,000 cells 
collected (top 10%) 


Sort #4 


600,000 cells 
collected (top 40%) 


(a) 6,000 cells 
collected (top 5%); 

(b) 10,000 cells 
collected (next 5%) 


280,000 cells 
collected (top 13%) 


Sort #5 


(a) 260,000 cells 
collected (top 10%); 

(b) 530,000 cells 
collected (next 25%) 


(a) from group (a) of 
sort #4, 100,000 cells 
collected (top 10%), 
and 350,000 cells 
collected (next 3 5%); 

(b) from group (b) of 
sort #4, 120,000 cells 
collected (top 10%) 


(Not done) 



Cells from each of the final sorts for each library were expanded and stored in 
liquid nitrogen. 



10 Isolation of activated genes from FACS-sorted cells 

Once cells had been sorted as described above, activated endogenous 
genes from the sorted cells were isolated by PCR-based cloning. One of ordinary 
skill will appreciate, however, that any art-known method of cloning of genes may 
be equivalently used to isolate activated genes from FACS-sorted cells. 
15 Genes were isolated by the following protocol: 
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(1) Using PolyATract System 1000 mRNA isolation kit (Promega), mRNA 
was isolated from 3x1 0 7 CD2+ cells (sorted 5 rounds by FACS, as 
described above) from libraries #1 and #2. 

(2) After mRNA isolation, the concentration of mRNA was determined by 
diluting 0.5 ul of isolated mRNA into 99.5 ul water and measuring OD 260 . 
21 ug of mRNA were recovered from the CD2+ cells. 

(3) First strand cDNA synthesis was then carried out as follows: 

(a) While the PCR machine was holding at 4°C, first strand 
reaction mixtures were set up by sequential addition of the 
following components: 

41 ul DEPC-treated ddH 2 0 

4 ul lOmM each dNTP 
8 ul 0. 1 MDTT 

16 ul 5x MMLV first strand buffer (Gibco-BRL) 

5 ul (lOpmol/ul) of the consensus polyadenylation site 

primer GD.R1 (SEQ ID NO: 10)* 

1 ul RNAsin (Promega) 

3 ul(1.25 ug/ul)mRNA. 
*Note: GD.R1, 5TTTTTTTTTTTTCGTCAGCGGCCGCATCNNNNTTT- 
ATT 3' (SEQ ID NO: 10), is a "Gene Discovery" primer for first strand cDNA 
synthesis of mRNA; this primer is designed to anneal to the poly-adenylation 
signal AATAAA and downstream poly-A region. This primer will introduce a 
Noil site into the first strand. 



25 



Once samples had been made up, they were incubated as follows: 

(b) 70° for 1 min. 

(c) 42° hold. 
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2 ul of 400 U/ul Superscript II (Gibco-BRL; Rockville, MD) was 
then added to each sample, to give a final total volume of 82 ul. 
After approximately three minutes, samples were incubated as 
follows: 

(d) 37° for 30 min. 

(e) 94° for 2 min. 
(r) 4° for 5 min. 

2 ul of 20 U/uI RNace-IT (Stratagene) was then added to each 
sample, and samples were incubated at 37° for 10 min. 

Following first strand synthesis, cDNA was purified using a PCR cleanup 
kit (Qiagen) as follows: 

(a) 80 ul of the first strand reaction were transferred to a 1.7 
ml siliconized eppendorf tube and adding 400 ul of PB. 

(b) Samples were then transferred to a PCR clean-up column 
and centrifuged for two minutes at 14,000 RPM. 

(c) Columns were then disassembled, flowthrough decanted, 
750 of pi PE were added to pellets, and tubes were 
centrifuged for two minutes at 14,000 RPM. 

(d) Columns were disassembled and flowthrough decanted, 
and tubes then centrifuged for two minutes at 14,000 RPM 
to dry resin. 



-129- 



(e) cDNA was then eluted using 50 pi of EB through 
transferring column to a new siliconized eppendorf tube 
which was then centrifiiged for two minutes at 14,000 
RPM. 

(5) Second strand cDNA synthesis was then carried out as follows: 

(a) Second strand reaction mixtures were set up at RT, 
through the sequential addition of the following 
components: 

ddH 2 0 55 ul 

10 xPCR buffer 10 pi 

50 mM MgCl 2 5 pi 

10 mM dNTPs 2 pi 
25 pmol/pl RIG.751-Bio* 4 pi 

25 pmol/plGD.R2** 4 pi 

First strand product 20 pi 

*Note: RIG.F751-Bio, 5' B iotin-C AGATC ACT AGAAGCTTT ATTGCGG 3' 
(SEQ ID NO: 1 1), anneals at the cap-site of the transcript expressed from pRIG 
vectors. 

**Note: GD.R2, 5' TTTTCGTCAGCGGCCGCATC 3" (SEQ ED NO: 12), is a 
primer used to PCR amplify cDNAs generated using primer GD.R1 (SEQ ID 
NO: 10). GD.R2 is a sub-sequence of GD.R1 with matching sequence up to the 
degenerate bases preceding the polyA signal sequence. 
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(b) Start second strand synthesis: 

94 °C for 1 min; 

add 1 ul Taq (5U/pL Gibco-BRL); 

add 1 ul Vent DNA pol (0. IU/uI, New England 

Biolabs). 

(c) Incubate at 63 ° C for 2 min. 

(d) Incubate at 72°C for 3 min. 

(e) Repeat step (b) four times. 

(f) Incubate at 72° C for 6 min. 

(g) Incubate at 4°C (hold) 

(h) END 

200 ul of 1 mg/ml Streptavidin-Paramagnetic Particles (SA-PMP) were 
then prepared by washing three times with STE. 

The products of the second strand reaction were added directly to the 
SA-PMPs and incubated at RT for 30 minutes. 

After binding, SA-PMPs were collected through the use of the magnet, 
and flowthrough material recovered. 

Beads were washed three times with 500 pi STE. 

Beads were resuspended in 50 pi of STE and collected at the bottom of 
the tube using the magnet. STE supernatant was then carefully pipetted 
off. 

Beads were resuspended in 50 pi of ddH 2 0 and placed into a 100°C water 
bath for two minutes, to release purified cDNA from PMPs. 
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(12) Purified cDNA was recovered by collecting PMPs on the magnet and 
carefully removing the supernatant containing the cDNA. 

(13) Purified products were transferred to a clean tube and centrifuged at 
14,000 RPM for two minutes to remove all of the residual PMPs. 

5 (1 4) A PCR reaction was then carried out to specifically amplify RIG activated 

cDNAs, as follows: 

(a) PCR reaction mixtures were set up at RT, through the 
sequential addition of the following components: 



H 2 0 59 ul 

10 10 x PCR buffer 10 ul 

■ 50 mM MgCl 2 5 ul 

lOmMdNTPs 2 pi 

25 pmol/ulRIG.F781* 2 ul 

25 pmol/ul GD.R2 2 pi 

15 second strand product 20 pi 



*Note: RIG.F781, 5' ACTCATAGGCCATAGAGGCCTATCACAG- 
TTAAATTGCTAACGCAG 3' (SEQ ID NO: 13), anneasl downstream of GD.F1 
GD.F3, GD.F5-Bio, and RIGF751-Bio, and adds an Sfil site for 5' cloning of 
cDNAs. This primer is used in nested PCR amplification of RIG Exonl specific 
20 second strand cDNAs. 

(b) Start thermal cycler: 
94 °C for 3 min; 

add 1 pi of Taq (5U/ul; Gibco-BRL); 
add 1 pi of 0. lU/pl Vent DNA polymerase (New England 
25 Biolabs) 
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PCR was then carried out by 10 cycles of steps (c) to (e): 

(c) 94°Cfor 30 sec. 

(d) 60°Cfor40sec. 

(e) 72 °C for 3 min. 

PCR was then completed by carrying out the following steps: 

(f) 94 °C for 30 sec. 

(g) 60 °C for 40 sec. 

(h) 72°Cfor3 min. 

(i) 72° C + 20 sec each cycle for 10 cycles 
(j) 72°C for 5 min 

(k) 4°C hold. 

(15) After elution of library material with 50 pi EB, samples were digested by 
adding 10 pi of NEB Buffer 2, 40 pi of dH 2 0 and 2 pi of Sfil and 
digesting for 1 hour at 50°C, to cut the 5' end of the cDNA at the Sfil site 
encoded by the forward primer (RIG.F78 1; SEQ ID NO: 13). 

( 1 6) Following Sfil digestion, 5 pi of 1 M NaCl and 2 pi of Notl were added to 
each sample, and samples digested for one hour at 37°C, to cut the 3' end 
of the cDNA at the Notl site encoded by the first strand primer (GD.R1; 
SEQ ID NO:10). 

(17) The digested cDNA was then separated on a 1% low melt agarose gel. 
cDNAs ranging in size from 1.2Kb to 8Kb were excised from the gel. 



(18) 



cDNA was recovered from the excised agarose gel using Qiaex II Gel 
Extraction (Qiagen). 2 pi of cDNA (approximately 30mg) was ligated to 
7pl (35ng) of pBS-HSB (linearized with Sfil/Notl) in a total volume of 
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10 ul of IX T4 ligase buffer (NEB), using 400 units of T4 DNA ligase 
(NEB). 

(19) 0.5 ul of the ligation reaction mixture from step (18) was transformed 
into E. coli DH10B. 

(20) 103 colonies/0. 5 ul ligated DNA were recovered. 

(21) These colonies were screened for exons using the primers M13F20 and 
JH1 82 (RIG Exonl specific) through PCR in 12.5 ul volumes as follows: 

(a) 100 ul of LB (with selective antibiotic) were dispensed 
into the appropriate number of 96-well plates. 

(b) Single colonies were picked and inoculated into individual 
wells of the 96-well plate, and the plate placed into a 37°C 
incubator for 2-3 hours without shaking. 



(c) A PCR reaction "master mix" was prepared on ice, as 
follows: 
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# of 96-Well Plates: 


1 Plate 


2 Plates 


3 Plates 


4 Plates 


Total Uoj 12.5 ixl PCR 


96 


192 


288 


384 


rxns: 










dH 2 0 


755 fil 


1.47 ml 


2.20 ml 


2.94 ml 


5XPCR Premix-4 


250 pi 


500 ijlI 


750 pi 


1.0 ml 


F Primers premix (25 
pmol/fxl) 


10 pi 


20 pi 


30 pi 


40 pi 


R Primers premix (25 
pmol/pl) 


10 pi 


20 pi 


30 pi 


40 pi 


RNace-It Cocktail 


3.2 pil 


6.3 pi 


9.6 pi 


12.8 pi 


Taq Polymerase (5 


3.2 nl 


6.3 pi 


9.6 pi 


12.8 pi 


U/pl) 










Total Volume (ml) 


1.01 


2.02 


3.03 


4.04 



(d) 10 }il of the master mix were dispensed into each well of 
the PCR reaction plate. 



(e) 2.5 ul from each 100 jil E. coli culture were transferred 
into the corresponding wells of the PCR reaction plate. 

(f) PCR was performed, using typical PCR cycle conditions of: 

(i) 94°C/2min. (Bacterial lysis and plasmid denaturation) 

(ii) 30 cycles of 92 ° C denaturation for 15 sec; 60 ° C primer 
annealing for 20 sec; and 72 °C primer extension for 
40 sec 

(iii) 72 °C final extension for 5 min. 

(iv) 4°C hold. 

(g) Bromophenol blue was then added to the PCR reaction; 
samples were mixed, centrifuged, and then the entire 
reaction mix was loaded onto an agarose gel. 
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23) Of 200 clones screened, 78% were positive for the vector exon. 96 of 
these clones were grown as minipreps and purified using a Qiagen 96-weli 
turbo-prep following the Qiagen Miniprep Handbook (April 1997). 



24) Many duplicate clones were eliminated though simultaneous digestion of 
5 2 ul of DNA with Noil, Bam HI, Xhol, Xbal, Hindi!!, EcoRl in 

NEB Buffer 3, in a total volume of 22 ul, followed by electrophoresis on 
a 1% agarose gel. 



Results: 

Two different cDNA libraries were screened using this protocol. In the 
10 first library (TMT#1), eight of the isolated activated genes were sequenced. Of 

these eight genes, four genes encoded known integral membrane proteins and six 
were novel genes. In the second library (TMT#2), 1 1 isolated activated genes 
were sequenced. Of these 1 1 genes, one gene encoded a known integral 
membrane protein, one gene encoded a partially sequenced gene homologous to 
1 5 an integral membrane protein, and nine were novel genes. In all cases where the 

isolated gene correspond to a characterized known gene, that gene was an integral 
membrane protein. 

Exemplary significant alignments (obtained from GenBank) for genes 
isolated from each library are shown below: 



TMT#1 Significant Alignments : 

179761 | gb |M7 65 59 | HUMCACNLB Human neuronal DHP-sensitive 

voltage-dependent, calcium channel alpha-2b subunit mRNA 
complete CDs. 
Length = 3600 

>gi I 3183974 I emb] Y10183 1 HSMEMD H. sapiens mRNA for MEMO protein 
Length = 4235 

TMT#2 Significant Alignments : 

>gi I 476590 ] gb | U06715 | HSU06715 Human cytochrome B561, HCYTO B561, mRNA, 
partial CDs. 
Length = 2463 
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>gi|2184843Igb|AA4S99S9|AA4S9959 zx€6c01.sl Scares total fetus 
Nb2HF8 9w Homo sapiens cDNA clone 796414 3' similar to 
gb:J03171 INTERFERON-ALPHA RECEPTOR PRECURSOR (HUMAN) ; 
Length = 431 

Example 6: Activation of Endogenous Genes using a Poly(A) Trap Vector 

HT1080 cells (1 x 10 7 cells) were irradiated with 50 rads using a 137 Cs 
source and electrop orated with 15 ug linearized pRIG14 (Figure 29A-29B. 
Following transfection, the cells were plated into a 150 mm dish at 5 x 106 
cells/dish. At 24 hours, puromycin was added to 3 ug/ml. The cells were 
incubated at 37° C for 12 days in the presence of 3 ug/ml puromycin. The media 
was replaced every 5 days. At 12 days, the number of colonies was counted, and 
the cells were trypsinized and replated onto a new dish. The cells were grown to 
90% confluency and harvested for frozen storage and gene isolation. Typically, 
1000-3000 colonies were produced per 1 x 10 7 cells transfected. 

Example 7: Activation of Endogenous Genes Using a Dual Poly(A) 
Trap/SAT Vector 

lx 10 7 HH1 cells (HPRT-minus HT1080 cells) were irradiated with 50 
rads using a I37 Cs source and electroporated with 15 ug linearized pRIG-22. 
Following transfection, the cells were plated into a 150 mm dish at 5 x 10 6 
cells/dish. At 24 hours, neomycin was added to 500 ug/ml G48 1 . The cells were 
incubated at 37° C for 4 days in the presence of 500 ug/ml G418. The media was 
replaced with fresh media containing 500 ug/ml G418 and AgThg and grown in 
the presence of both drugs for an additional 7 days. Alternatively, as a control for 
HPRT activity, the media was replaced with fresh media containing 500 ug/ml 
G418 and HAT (available from Life Technologies, Inc., Rockville, MD, and used 
at manufacturer's recommended concentration) and grown in the presence of both 
drugs for an additional 7 days. At 12 days post transfection, the number of 
colonies was counted, and the cells were trypsinized and replated onto a new dish. 
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The cells were grown to 90% confluency and harvested for frozen storage and 
gene isolation. Typically, cells subjected to G418/AgThg selection produced 
1000-3000 colonies per 1 x 10 7 cells transfected. In contrast, cells subjected to 
G418/HAT selection produced approximated 100 colonies per 1 x 10 7 cells 
5 transfected. 

Example 8: Isolation of activated genes 

Non-targeted gene activation vectors are integrated into the genome of a 
eukaryotic cells using the methods of the invention. By integrating the vector into 
multiple cells, a library is created in which cells are expressing different vector 
10 activated genes. RNA is isolated from these cells using a commercial RNA 

isolation kit. In this example, RNA is isolated from cells using Poly(A) Tract 1 000 
(Promega). The RNA is converted into cDNA, amplified, size fractionated, and 
cloned into a plasmid for analysis and sequencing. A brief description of this 
process is presented. 

15 1) Place 4 ml GTC Extraction buffer (Poly(A) tract 1000 Kit- Promega) in a 1 5 

ml polycarbonate screw cap tube and add 168 ul 2-mercaptoethanol and place in 
a 70 °C water bath. 

2) Place 8 ml dilution buffer in a 15 ml polycarbonate screw cap tube for every 
pellet processed and add 168 ul 2-mercaptoethanol and place in a 70°C water 

20 bath. 

3) Remove from -80°C storage cell pellets (1 x 10 7 - 1 x 10 8 cells) containing 
non-targeted gene activation vector integrated into their genome. Pipette 4ml 
GTC Extraction buffer immediately onto cell pellet. Pipette up-and-down several 
times until the pellet is resuspended and transfer into a 15 ml snap cap 

25 polypropylene tube. 

4) Add the 8 ml dilution buffer and mix by inversion. 

5) Add 10 pi (500 pmol) of the biotinlylated oligo dT primer and mix. 

6) Let sit at 70°C for 5 minutes inverting every couple of minutes to ensure even 
heating. 
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7) Centrifuge in a Sorvall HB-6 rotor at 7800 rpm (10k x g) at 25°C for 10 
minutes. During this period of time wash 6 ml Strepavidin-Paramagnetic particles 
(S A-PMPs) 3x with 6 ml 0.5x SSC through use of the Poly(A) Tract system 1000 
magnet. 

5 8) After 3 washes resuspend the SA-PMPs in 6 ml 0.5 x SSC. 

9) Pipette to remove the supernatant from the RNA prep and add to the 
resuspended SA-PMPs (Be careful when removing supernatant so that you do not 
disrupt the pellet). 

1 0) Let the S A-PMP/RNA mix and incubate for 2 minutes at room temperature. 
10 11) Capture the magnetic beads through use of the Poly(A) Tract system 1000 

magnet. Note that it takes some time for all of the beads to pellet due to the high 
viscosity of the liquid. 

12) Pour off the supernatant and resuspend the beads in 1 .7 ml of 0.5 x SSC using 
a 2 ml pipette and transfer to a 2 ml screw cap tube. 
15 13) Capture the SA-PMPs using the magnet and remove the supernatant by 

pipetting with a P1000. 

14) Add 1.7 ml 0.5x SSC and invert the tube several times to mix. 

15) Repeat steps 14 and 15 two more times. 

16) Resuspend the SA-PMPs in 1 ml of nuclease free water and invert several 
20 times to mix. 

17) Capture the SA-PMPs and pipette off the mRNA. 

18) Place 0.5 ml of the mRNA into each of two siliconized eppendorf tubes and 
add 50 ul of DEPC-treated 3M NaOAc solution and 0.55 ml of isopropanol. 
Invert several times to mix and place at -20°C for at least 4 hours. 

25 1 9) Centrifuge the mRNA for 1 0 minutes at max RPM (14 k). 

20) Carefully pipette off the supernatants and wash pellets with 200 pi 80% 
ethanol through re-centrifugation for 2 minutes at 14K RPM. Note that the pellets 
are often brown or tan in color. This color results from residual SA-PMPs. 

21) Remove wash and let pellets air dry for not more than 10 minutes at room 
30 temperature. 
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22) Resuspend pellets in 5 ul each and combine into a single tube. 

23) Centrifuge at 14K RPM for 2 minutes to remove the residual SA-PMPs and 
carefully remove the mRNA. 

24) Determine the concentration of mRNA by diluting 0.5 ^1 into 99.5 ui water 
and measuring OD 260. Note that 1 OD 260 = 40 ug RNA. 

25) Set up first strand reaction for both the test sample and the negative control 
(HT1080) through the sequential addition of the following components while the 
PCR machine is holding at 4°C: 

Step 1: 

42 ul DEPC-treated ddH 2 0 

4 |il lOmM each dNTP 
8 ulO.l MDTT 

16 ul 5x MMLV 1st strand buffer 

5 ul(10pmol/ul)GDRl 
1 ul RNAsin (Promega) 
4 ul(1.25 ug/ul) mRNA. 

Step 2: 70°/l min 

Step 3: 42°/hold 

Step 4: After 1 minute add 2 ul SUPERSCRIPT II® (Life 

Technologies, Inc.; Rockville, MD) and incubate at 37°C for 30 min 
Step 5: 94°/2 min 

Step 6: 4%° 

Step 7: Add 2 ul RNase and incubate at 37°C for 10 min 

Step 8: 4°/oo 

26) Analyze 8 ul of cDNA on a 1% agarose gel to check for cDNA synthesis and 
purify remaining cDNA using the PCR cleanup kit from Qiagen by transferring the 
70 ul first strand reaction to a 1 .5 ml siliconized eppendorf tube and adding 400 ul 
PB. 

27) Transfer to a PCR clean-up column and centrifuge 2 minutes at max RPM. 

28) Disassemble column and pour out Flow through. Add 750 ul PE and 
centrifuge 2 minutes at max RPM. 
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29) Disassemble column and pour out Flow throught then centrifuge 2 minutes 
at max RPM to dry resin. 

30) Elute using 50 pi of EB through transferring column to a new siliconized 
eppendorf tube and centrifuging for 2 minutes at max RPM. 

3 1) Second Strand cDNA synthesis set up at RT: 

H 2 0 8.5 ul 

10 XPCR buffer 5 pi 

50 mM MgCl 2 2.5 pi 

lOmMdNTPs 1 pi 

25 pmol/pl GDF5Bio 10 pi 

25 pmol/pl GDR2 10 pi 

First strand product 15 pi 

Step 9: 94°C/1 min. 

Step 10: 60°C/10 min. 

Add 0.25 pi Taq polymerase 
Step 1 1: 60X/2 min. 

Step 12: 72°C/10min. 
Step 13: 94°C/1 min. 

Step 14: mm go to "Step 1 1 " four more times 

Step 15: 60°C/2 min 

Step 16: 72 °C/ 10 min 
Step 17: END 

32) Prepare 100 pi of SA-PMPs by washing 3 x with STE and collection using 
a magnet. After the final wash, resuspend the beads in 150 pi STE. 

33) Purify the products of the second strand reaction using the PCR cleanup kit 
from Qiagen. Elute in 50 pi EB and add he products of the second strand reaction 
to 150 pi of the PMPs. 

34) Mix gently at RT for 30 minutes. 

35) After binding collect SA-PMPs through use of a magnet and recover flow 
through material (SAVE THIS MATERIAL!) 

36) Wash the beads 3 x with 500 pi STE and lx with NEB 2 (lx). 

37) Resuspend the beads in 100 pi NEB 2 (lx). 
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SS) Add 2 pi Sfil and digest at 50 °C for 30 minutes with gentle mixing every 10 
minutes. 

39) Recover purified cDNA through use of a magnet and carefully removing the 
supernatant. 

5 40) Transfer the products to a new tube and centrifuge at maximum RPM for 2 

minutes to remove all of the beads. 

41) Set up a PCR reaction to specifically amplify RAGE activated cDNAs: 



H 2 0 37 pi 

10 X PCR buffer 10 pi 

10 lOmMdNTPs 2 pi 

25pmoI/plGDF781 10 pi 

25 pmol/pl GDR2 10 pi 

Second strand product 25 pi 



Step 1: 


94°C/2 min. 


Step 2: 


94°C/45 sec. 


Step 3: 


60°C/10 min. 


Add 0.5 pi Taq Polymerase 


Step 4: 


72°C/10 min. 


Step 6: 


60°C/2 min. 


Step 7: 


72°C/10 min. 


Step 8: 


Cycle to step 5, 8 more times 


Step 9: 


94°C/45 sec. 


Step 10: 


60°C/2 min. 


Step 11: 


72 °C/ 10 min. + 20 sec each cycle 


Stepl2: 


Cycle to step 9, 14 more times 


Step 13: 


72°C/ 5 min. 


Step 14: 


4° C hold 
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42) Check specificity of PCR amplification of HT 1080 versus library material 
through analysis on a 1% agarose gel. If there is a high specificity of cDNA 
amplification, then use Qiagen PCR clean up kit to purify PCR products. 

43) After elution of library material with 50 ul EB add 10 ul NEB 2, 40 ul dH 2 0 
5 and 2 ul Sfil and digest for 1 hour at 50 °C. 

44) Add 5 ul of 1 M NaCl and 2 ul of Notl and digest for 1 hour at 37°C. 

45) Prepare and run a 1% L.M. agarose gel and run library material on gel. After 
visualization of material, cut out fragments ranging in size from 500bp to 10 Kb. 

46) Recover the library DNA from agarose using Qiaex II Gel Extraction 
1 0 Protocol (Qiagen) and elute DNA in 1 0 ul EB . Ligate 5 ul of this material to 4 ul 

pBS-HSB (SfiVNotT) or pBS-SNS in a total volume of 10 ul. 

47) Transform E. coli with 0.5 ul ligated DNA per 40 ul cells. 

48) Pick colonies, grow overnight in LB, isolate plasmids. 

49) Analyze gene activated cDNA inserts by restriction digest and DNA 
15 sequencing. 

Example 9: Isolation of Activated Genes from Subtracted cDNA Pools 

Purified mRNAs from non-transfected HT1080 cells was prepared using 
the Poly-A Tract 1000 system (Promega), as described in Example 8 steps 1-24, 
and were biotinylated using EZ-LinkTM Biotin LC-ASA reagent (Pierce), as 
20 follows: 

1 . ) 25 ul DEPC-treated dH 2 0 and 1 5 ul containing 10 ug of HT 1080 mRNA was 
added into a siliconized microfuge tube and held on ice. 

2. ) Working under subdued light, 40 ul of prepared LC-ASA stock reagent 
(1 mg/ml in 100% ethanol) was added into the reaction tube. 

25 3.) A UV light (365 nm wavelength) was positioned 5 cm above the microfuge 

tube and used to irradiate the reaction mix for 15 minutes. 
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4.) Unlinked biotin reagent was removed from the labeled HT1080 mRNA by 
passing the reaction mix through an RNase-free MicroSpin P-30 column 
(BioRad), as prescribed by the manufacturer. 

HT 1 080 cells were transfected with a poly(A) trap pRIG activation vector 
and grown under selective media to produce a population of drug resistant 
colonies, as described in Example 1 . Purified mRNAs were prepared from the 
pooled colonies using the Promega Poly-A Tract 1000 system, as described in 
Example 8. First strand cDNA was prepared from 5 ug of this mRNA using oligo 
GD.R1 (TTTTTTTTTTTTCGTCAGCGGCCGCATCNNNNTTTATT) (SEQ ID 
NO: 10), as described in Example 8, Step 25. The reaction mix was passed 
through a Qiagen PCR Quick Clean-up column and the purified 1st strand cDNA 
was recovered in 100 ul EB. 

The subtractive hybridization of biotinylated HT1080 mRNAs (subtractor 
population) and 1st strand cDNAs prepared from the superpool of pRIG- 
transfected colonies (target population) was performed as follows: 

1 . ) 9 ug of biotinylated mRNA was added into a 0.5 ml micro fuge tube containing 
0.5 ug 1st strand cDNA. 

2. ) l/100x volume of 10 mg/ml glycogen, l/10x volume of 3 M sodium acetate, 
pH 5.5, and 2.6x volume of 100% ethanol were added into the tube and mixed. 

3. ) The tube was placed at -80° C for 1 hr, then spun in a refrigerated microfuge 
for 20 minutes. 

4. ) The pellet of precipitated nucleic acids was drained, washed once with 70% 
ethanol, then air-dried. 

5. ) The pellet was solvated in 5 ulHBS (50 mM HEPES, pH 7.6; 2 mM EDTA; 
0.2% SDS; 500 mM NaCl) and overlayered with 5 ul light mineral oil, then heated 
to 95 °C for 2 minutes followed by 68°C for 24 hours. 

6. ) The reaction mix was diluted with 100 pi HB (HBS without SDS) and 
extracted once with 100 pi chloroform to remove the oil. 

7. ) The diluted hybridization mix was added to 300 pi streptavidin-coated 
paramagnetic particles (Promega) which had been pre-washed 3x in 300 pi HB. 
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8. ) The mix was incubated 10 minutes at room temperature and the SA-PMP's 
and bound Biotin-mRNA:DNA hybrids were removed from solution by magnetic 
capture. 

9. ) Steps 7 and 8 were repeated once. 

5 10.) The cleared solution was subjected to one additional round of subtractive 

hybridization and magnetic removal of captured hybrids (Steps 1-9), with the 
following exceptions: 

Step 6: the hybridization reaction was diluted with 2x PCR Buffer 
(40 mM Tris-HCL pH 8.4; 100 mM KC1). 
10 Step 7: PMPs were pre-washed in IX PCR Buffer 

The twice-subtracted 1st strand cDNA was used to generate 2nd strand 
cDNA by combining 45 pi of 1st strand cDNA with 7 pi dH 2 0, 5 pi 50 mM 
• MgCl 2 , 2 pi premix of 10 mM each dNTP, 1 pi lOx PCR Buffer, 20 pi of 12.5 
pmol/pl GD19Fl-Bio (5' Biotin-CTCGTTTAGTGCGGCCGCTCAG- 
15 ATCACTGAATTCTGACGACCT) (SEQ ID NO: 14), 20 pi of 12.5 pmol/pl 

GD.R2 (TTTTCGTCAGCGGCCGCATC) (SEQ ID NO: 12), and 0.5 pi Taq 
Polymerase, with thermocycling as described in Example 8, Step 3 1 . The second 
strand cDNA product was amplified and further processed for the production of 
an E. co/i-based cDNA library, as described in Example 8, steps 32-49. 

20 Example 10: Selective Capture of RIG-activated Transcripts 

HTI080 cells were transfected with pRIG19 activation vector 
(Figure 30A-30C) and cultured for 2 weeks in selective media, as described in 
Example 6. Total RNA was prepared from a pellet comprised of 10 8 cells using 
TRIzol® Reagent (Life Technologies, Inc.; Rockville, MD) following the 
25 manufacturer's protocol, and was dissolved in 720 pi of DEPC-treated dH 2 0 

(dH 2 0 DEPC ). Contaminating genomic DNA was eliminated from the RNA 
preparation by mixing 80 pi NEB lOx Buffer 2, 8 pi Promega RNasin, and 20 pi 
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RQ1 Promega RNase-free DNase, incubating at 37°C for 30 minutes, extracting 
sequentially with equal volumes ofphenol:chlorofom(l : 1) and chloroform, mixing 
with l/10x volume sodium acetate (pH 5.5), precipitating the RNA with 2x 
volume of 1 00% ethanol, and solvating the dried RNA pellet in dH 2 0 DEPC to a final 
concentration of 4.8 ug/ul. 

mRNA transcripts derived from pRIGl 9-activated genes were selectively 
captured from the pool of total cellular RNAs by mixing in a 2 mi RNase-free 
microfuge tube 150 ul total RNA, 150 ul HBDEPC (50 mM HEPES, pH 7.6; 
2 mM EDTA; 500 mM NaCl), 3 ul Promega RNasin, and 2.5 ul (25 pmol/ul) 
oligo GD 19.Rl-Bio (see Table 1), then incubating at 70°C for 5 minutes followed 
by 50 °C for 15 minutes. One ml of Promega streptavidin coated paramagnetic 
particles (SA-PMPs) was magnetically captured and washed 3x each with 1.5 ml 
of 0.5x SSC, and the SA-PMPs were left without being resuspended. The warm 
oligo:RNA hybridization reaction was added directly into the tube containing the 
semi-dry SA-PMPs. After incubating for 1 0 minutes at room temperature the SA- 
PMPs were washed 3x with 1ml 0.5x SSC. 



Table 1: Primer and Oligonucleotide Sequences 





Primer/Oligo 
Name 


Sequence 


SEQID 
NO: 


Forward 

PCR 

Primers 


GD19.Fl-Bio 


5' B iotin-CTCGTTT AGTGCGG- 
CCGCTCAGATCACTGAATTC 
TGACGACCT 


14 


GD19.F2-Bio 


5' B iotin-CTCGTTT AGTGGCG- 
CGCCAGATCACTGAATTCTG 
ACGACCT 


15 


GD19.F2 


GACCTACTGATTAACGGCC- 
ATA 


16 
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Reverse 

PCR 

Primers 


GD.R1 


TTTTTTTTTTTTCGTCAGCG- 
GCCGCATCNNNNTTTATT 


10 


GD.R2 


TTTTCGTCAGCGGCCGCATC 


12 


mRNA 
Capture 
Oligo 


GD19.Rl-Bio 


TCGTCAGAATTCAGTGAT- 
CT-3' Biotin 


17 



After the final magnetic capture, the SA-PMP's were suspended in 190 
pi dH 2 ODEPCand incubated at 68 °C for 15 minutes. PMPs were immobilized 
by exposure to a magnetic and the cleared solution containing RIG-activated 
transcripts was transferred to a microfuge tube. 63 pi of captured RIG-activated 
transcript were transferred to a PCR tube where first and second strand cDNA 
synthesis was performed using PCR program "1+2CDNA", as follows: 

Step I: 4°C/<*>: Add into the PCR tube containing the RIG- 

activated transcripts 20 pi 5x GibcoBRL RT Buffer, 1 pi 
PromegaRNasin, 10 pi 100 mM DTT, 5 pi dNTP premix 
at 10 mM each, 1 pi oligo GD.R1 (see Table 1) at 
25 pmol/pl. 
Step 2: 70°C/3 minutes 

Step 3: 42 ° C/ 1 0 minutes 

Step 4: Add 2.5 pi SUPERSCRIPT II® (Life Technologies, Inc.), 

then incubate at 37°C/1 hour 
Step 5; 94°C/2 minutes 

Step 6: 4°C/~. 



25 To the 1st strand cDNA mix, 2 pi of Stratagene RNase-It was added and 

the mixture was incubated at 37°C for 15 minutes. 600 pi of Qiagen PB reagent 
was added to the reaction, then transferred to a Qiagen PCR clean-up column and 
processed according to the manufacturer's protocol. cDNA was eluted from the 
column in 50 pi EB and transferred to a PCR tube. The second strand cDNA 
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reactionwas performed using oligosGD19.F2-Bio (Table 1) and GD.R2 (Table 1) 
as described in Example 9. The second strand product was captured on Promega 
S A-PMPs as described in Example 9, with the exception that the final suspension 
of S A-PMPs was in Ix NEB 4 Buffer and the captured cDNAs were cleaved from 
the particles using restriction endonuclease Asc I. Amplification of the second 
strand cDNA products using oligos GD19.F2 and GD.R2, digestion of the 
amplified cDNAs using endonucleases Sfil and Notl, and size selection of cDNAs 
prior to cloning were all performed as described in Example 9. The final cDNA 
cleanup was achieved by eluting the cDNA pool off a Qiagen PCR Cleanup 
column in 30 pi EB. 11 ul of cDNA was mixed with 4 ul 5x GibcoBRL Ligase 
Buffer, 4 pi pGD5 vector DNA previously prepared by digestion with Sfil, Notl, 
and CIP. 1 ul T4 DNA Ligase was added, and the reaction mix was incubated 
at 16°C overnight. 1 pi of ligation reaction was used to transform electro - 
competent E. coli DH 1 OB cells, which were subsequently plated on LB agar plates 
containing 12.5 fig/ml chloramphenicol. Typically, 60 to 80 bacterial colonies 
were recovered per pi of ligation mix transformed. 

Example 11: Selective Capture of RIG-activated Transcripts 

HT 1080 cells were transfected with pRIGl 9 activation vector and cultured 
for 2 weeks in selective media, as described in Example 6. Total RNA was 
prepared from a pellet comprised of 10 8 ceils using TRIzol® Reagent (Life 
Technologies, Inc.) following the manufacturer's protocol, and was dissolved in 
720 ul of DEPC treated dFLO (dH 2 0 DEPC ). Contaminating genomic DNA was 
eliminated from the RNA preparation by mixing 80 pi NEB lOx Buffer 2, 8 pi 
Promega RNasin, and 20 pi RQ 1 Promega RNase-free DNase, incubating at 3 7 ° C 
for 30 minutes, extracting sequentially with equal volumes of phenolxhlorofom 
(1:1) and chloroform, mixing with l/10x volume sodium acetate (pH 5.5), 
precipitating the RNA with 2x volume of 100% ethanol, and solvating the dried 
RNA pellet in dH20DEPC to a final concentration of 4.8 ug/ul. 
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mRNA transcripts derived from pRIG19-activated genes were selectively 
captured from the pool of total cellular RNAs by mixing in a 2 ml RNase-free 
microfiige tube 150 ul total RNA, 150 ul HBDEPC (50 raM HEPES, pH 7.6; 
2 mM EDTA; 500 mM NaCl), 3 ul Promega RNasin, and 2.5 ul (25 pmol/ul) 
oligo GD19.Rl-Bio (see Table 1), then incubating at 70 °C for 5 minutes followed 
by 50° C for 15 minutes. One ml of Promega streptavidin coated paramagnetic 
particles (SA-PMPs) was magnetically captured and washed 3x each with 1.5 ml 
of 0.5x SSC, and the SA-PMPs were left without being resuspended. The warm 
oligo: RNA hybridization reaction was added directly into the tube containing the 
semi-dry SA-PMPs. After incubating for 10 minutes at room temperature the SA- 
PMPs were washed 3x with 1ml 0.5x SSC. After the final magnetic capture the 
SA-PMP's were suspended in 190 ul dH 2 0 DEPC and incubated at 68 °C for 
1 5 minutes. PMPs were immobilized by exposure to a magnetic and the cleared 
solution containing RIG-activated transcripts was transferred to a micro fuge tube. 
63 ul of captured RIG-activated transcript were transferred to a PCR tube where 
first and second strand cDNA synthesis was performed using PCR program 
"1+2CDNA", as follows: 

Stepl: 4°C/«>: Add into the PCR tube containing the RIG- 

activated transcripts 20 ul 5x GibcoBRL RT Buffer, 1 ul 
Promega RNasin, 10 ul 100 mM DTT, 5 ul dNTP premix 
at 10 mM each, 1 ul oligo GD.R1 (see Table 1) at 
25pmol/ul. 
Step 2: 70°C/3 minutes 

Step 3: 42°C/10 minutes 

Step 4: Add 2.5 pi SUPERSCRIPT II® (Life Technologies, Inc.), 

then incubate at 37°C/1 hour 
Step 5: 94 ° C/2 minutes 

Step 6: 60 o C/°°; while holding temperature, the following were 

added: 2 pi 50 mM MgCl 2 , 1 pi oligo GD19.Fl-Bio 
(Table 1) at 25 pmol/ul, and 2 pi Stratagene RNace-It. 
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After 10 minutes, 0.5 pi Taq DNA Polymerase (Life 
Technologies, Inc.) was added and the cycling was 
continued: 

Step 7: 72°C/10 minutes 

Step 8: 4°C/~ 

The 100 ul volume cDNA reaction mix was transferred to a 1.5 ml 
siliconized microfuge tube and extracted sequentially with equal volumes of 
phenol: chloroform (1:1) and chloroform, and the aqueous phase was transferred 
to a new tube and place in speed-vac for 5 minutes at 37°C. Restriction digestion 
of the cDNA was performed by adding 74 ul dH 2 0, 20 pi NEB 1 Ox Buffer 2, 2 ul 
1 mg/ml BSA, 4 ul Sfil and incubating at 50 °C for 1 hour, then adding 10 ul 1 M 
NaCl, 4 p.1 Notl and incubating an additional 3 7° C for 1 hour. The reaction mix 
was extracted sequentially with equal volumes of phenolxhloroform (1:1) and 
chloroform, then cDNAs were precipitated by adding 1/1 OOx volume 10 mg/ml 
glycogen, 1/3 Ox volume 3 M sodium acetate (pH 7.5), 2x volume 100% absolute 
ethanol, and freezing at -80 °C for 1 hour. The cDNA pellet was washed once 
with 70% ethanol and air dried for 15 minutes, then solvated in 5 ul dH 2 0, 1 ul 
1 OX NEB Ligase Buffer, 4 ul pGD5 vector DNA previously prepared by digestion 
with Sfil, Notl, and CIP. 0.5 pi T4 DNA Ligase was added, and the reaction mix 
was incubated at 16°C overnight. 10 ul dH 2 0 was added to the ligation reaction 
and 0.5 pi was used to transform electro-competent E. colt DH10B cells. 
Typically, 6 to 10 colonies per pi of transformed ligation mix were observed. 

Example 12: Ligation of Activation Vectors to Genomic DNA and 
Transfection into Human Cells 

Genomic DNA was harvested from a human cell line, HT1080 (10 8 cells), 
according to published procedures (Sambrook et al., Molecular Cloning, Cold 
Spring Harbor Laboratory Press, (1989)). The isolated genomic DNA was 
digested with BamHl under conditions that resulted in incomplete digestion. This 
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was accomplished by titrating the amount of BamHl in the reaction. Each reaction 
contained 1 0 ug genomic DNA and BamHl at a concentration of either 0. 0 1 , 0. 02, 
0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.62, or 11.24 units. After a one hour 
incubation at 37°C, the reactions were stopped by phenol extraction, followed by 
5 ethanol precipition. The digested DNA from each reaction was separated by 

agarose gel electrophoresis. Reactions containing DNA predominantly in the 
range of 10 kb to 400 kb were combined for ligation to the activation vector. The 
pooled, digested genomic DNA was then added to BamHl linearized activation 
vector in 1 X ligation buffer. Ligase (Life Technologies, Inc., 40 units) was added 

10 and the ligation reaction was incubated at 16°C for 24 hours. Following ligation, 

the genomic DNA/activation vector was transfected into HT1080 cells using 
LIPOFECTIN® (Life Technologies, Inc.) according to the manufacturer's 
procedures. Optionally, the HT1080 cells were irradiated prior to or after 
transfection. When cells were irradiated, doses in the range of 0. 1 rads to 200 

15 rads were found to be particularly useful. Following transfection, cells were 

grown in complete media. At 36 hours post-transfection, G4 1 8 (300 ug/ml) were 
added to the media. At 10-14 days post selection, the drug resistant clones were 
pooled, expanded, and harvested. Total RNA or mRNA was collected from the 
harvested cells. cDNA derived from vector activated genes was then synthesized 

20 and isolated using the methods described herein (see, e.g., Example 8 supra). 



Example 13: Co-transfections of BAC Contig Clones with the Activation 
Vector 

Genomic libraries were created in pUniBAC (Figure 34 A-34B) according 
to published procedures (Shizuya et al., Proc. Natl. Acad. Sci. USA 89:8794 
25 (1992)). Typically, the size of genomic fragments can be between 1 kb and 500 

kb, and preferably between 50 kb and 500 kb. The BAC library was propagated 
in E. coli. To prepare plasmids for transfection, the library was plated onto LB 
agar plates containing 12.5 ug/ml chloramphenicol. Approximately 1000 clones 
were present on each 1 50 mm plate. Following growth and selection, the colonies 
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from each plate were eluted from the agar plate through the addition of LB and 
pooled. Each pool (-10,000 clones) was grown in 1 liter LB/12.5 pg/ml 
chloramphenicol overnight. BAC plasmids were then isolated from each pool 
using a commercial kit (Qiagen). 

Purified BAC clones were digested with I-Ppo-I which cleaves a unique 
site in the BAC vector flanking the cloning site. Since I-Ppo-I is an ultra-rare 
cutter, it will not digest the vast majority of genomic DNA inserts. Following 
digestion, the linearized genomic library clones were cotransfected into HT1080 
cells using LlPOFECTIN® (Life Technologies, Inc.) according to the manufacturer's 
directions. Briefly, 10 fig of BAC genomic DNA was combined with 1 pg of 
linearized pRIG20 (Figure 31A-31C) in a-MEM (no serum). 5 ug of 
LlPOFECTIN® was added to the DNA and the mixture was incubated at room 
temperature for 15 minutes. The DNA/Lipofectin® mixture was then added to 
10 5 HT1080 cells in a 6 well dish. The cells were incubated with the 
DNA/LIPOFECTIN® in serum free a-MEM for 12 hours, washed, and placed in 
a-MEM/ 1 0%FBS for 36 hours. To select for cells that had integrated the vector 
and genomic DNA, the transfected cells were repiated into a 10 cm dish and 
incubated in the presence of 300 ug/ml G4 18 for 10 days. Drug resistant clones 
were expanded and harvested to allow isolation of the activated cDNA molecules 
as described herein in Example 8. 

Example 14: In vitro Integration of Activation Vector into Purified Genomic 
DNA and Transfection of the Integration Products into Host 
Cells 

Genomic DNA was isolated and cloned into the Bacterial Artificial 
Chromosome, pUniBAC (Figure 34A-34B), using published procedures 
(Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, 
(1989); Shizuya et al., Proc. Natl. Acad, Sci. USA 59:8794 (1992)). Following 
ligation of the genomic inserts into pUniB AC, the plasmids were transformed into 
the E. coli strain DH10B (Life Technologies, Inc.) and selected on tetracycline. 
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Individual bacterial clones were combined into pools containing approximately 
1000 members. Each pool was grown to saturation in 1 liter LB/tetracycline. 
pUniBAC plasmids containing genomic DNA inserts were isolated from the 
bacteria using a commercial kit (Qiagen). 
5 For each pool of UniB AC clones, 2 pig of the library were incubated with 

50 ng of the activation vector pRIG-T and 1 unit of mutant Tn5 transposase for 
2 hours at 37°C (transposase available from Epicentre Technologies). Following 
incubation, the pUniB AC clones were transformed into DH10B cells and selected 
on chloramphemcol. All colonies from each pool were combined and grown in 

10 1 liter LB/chloramphenicol. Plasmids were harvested using Qiagen Tip-500 

columns according to the manufacturer's instructions. 

For each pool, 20 ug of the library was transfected into 2xl0 6 HT1080 
cells with 30 ug Ex-gen 500 (MBI Fermentas). according to the manufacturer's 
instructions. At 48 hours post-transfection, the cells were placed into media 

15 containing 3 ug/ml puromycin. After 10 days of growth in the presence of 

puromycin, drug resistant clones were pooled, expanded and harvested for gene 
discovery. To isolate vector activated genes, mRNA from each pool of cells was 
isolated, converted to cDNA, and cloned into plasmids as described in Example 8. 
Individual cDNA clones were analyzed by restriction digestion and sequencing. 



20 Example 1 5: Creation of Protein Expression Libraries from Cloned Genomic 

DNA 

A genomic library containing genomic DNA inserts ( 1 00 kb avg. size) was 
created in pUniBAC as described in Examples 13 and 14. (Note: In some 
embodiments of the invention, the genomic fragments are cloned into the 
25 linearization site of an activation vector, wherein the activation vector is preferably 

a YAC, BAC, PAC, or Cosmid based vector.) In this example, the activation 
vector, pRIG-TP, was integrated into the BAC genomic library using in vitro 
transposition as described in Example 14. pRIG-TP is shown in Figure 36. 
Following integration, the library plasmids were transformed into E. coli and BAC 
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vectors containing an integrated pRIG-TP vector were selected for on 
chloramphenicol plates. Colonies were pooled and grown to saturation in 
LB/Tetracycline. B AC plasmids were harvested using a commercial kit (Qiagen). 
For each transfection, 20 ug of the B AC library was transfected into 2x1 0 6 
5 HT1080 cells using 30 ug Ex-gen 500 (MBI Fermentas) according to the 

manufacturer's instructions. At 48 hours post transfection, the cells were placed 
into mdia containing 3 ug/ml puromycin. After 1 0 days of selection, drug resistant 
clones were pooled and expanded. The expaned pools of drug resistant clones 
were divided into separate groups for freezing, protein production, and episome 

10 amplification. 

To isolate and test activated secreted proteins, culture supernatants were 
harvested and saved at -80 ° C until used in specific assays. Activated intracellular 
proteins were harvested from cell lysates (prepared by any method known in the 
art) and used in in vitro assays. 

15 To amplify the copy number of the B AC episomes, the cells were selected 

with increasing concentrations of methotrexate. In these experiments, the initial 
methotrexate concentration was 20 nM. Methotrexate concentrations were 
doubled every 7 days until cells resistant to 5 uM were obtained. At each 
methotrexate concentration, a portion of cells were removed for storage and 

20 protein production. Activated secreted and intracellular proteins were harvested 

from these cells as described for the non-methotrexate selected cells. 



Having now fully described the present invention in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be obvious 
to one of ordinary skill in the art that the same can be performed by modifying or 
25 changing the invention within a wide and equivalent range of conditions, 

formulations and other parameters without affecting the scope of the invention or 
any specific embodiment thereof, and that such modifications or changes are 
intended to be encompassed within the scope of the appended claims. 
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All publications, patents and patent applications mentioned in this 
specification are indicative of the level of skill of those skilled in the art to which 
this invention pertains, and are herein incorporated by reference to the same extent 
as if each individual publication, patent or patent application was specifically and 
individually indicated to be incorporated by reference. 
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WHAT IS CLAIMED IS: 



1 . A vector construct comprising: 

(a) a first transcriptional regulatory sequence operably linked to a first 
unpaired splice donor sequence; 

(b) a second transcriptional regulatory sequence operably linked to a 
second unpaired splice donor sequence; and 

(c) a linearization site. 



2. The vector construct of claim 1, wherein said linearization site is 
located between said first unpaired splice donor site and said second 

10 transcriptional regulatory sequence. 

3. The vector construct of claim 1, wherein when said vector 
integrates into the genome of a host cell, said first transcriptional regulatory 
sequence is in an inverted orientation relative to the orientation of said second 
transcriptional regulatory sequence. 

15 4_ The vector of claim 1 , wherein said vector has been rendered linear 

by cleavage at said linearization site. 



5. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) an unpaired splice donor site; 

(c) a rare cutting restriction site, and 

(d) a linearization site. 

6. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) a vector-encoded exon comprising a rare cutting restriction site; 

(c) an unpaired splice-donor site; and 
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(d) a linearization site. 

7. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) a vector-encoded exon comprising a first rare cutting restriction 
site; 

(c) an unpaired splice-donor site; 

(d) a second rare cutting restriction site; and 

(e) a linearization site. 

8. A vector construct comprising: 

(a) a first transcriptional regulatory sequence operably linked to a 
selectable marker lacking a polyadenylation signal; and 

(b) a second transcriptional regulatory sequence operably linked to an 
exon-splice donor site complex, 

wherein said first transcriptional regulatory sequence is in the same orientation in 
said vector construct as said second transcriptional regulatory sequence. 

9. A vector construct comprising a transcriptional regulatory 
sequence operably linked to a selectable marker lacking a polyadenylation signal, 
and further comprising an unpaired splice donor site. 

10. A vector construct comprising a first transcriptional regulatory 
sequence operably linked to a selectable marker lacking a polyadenylation signal, 
and further comprising a second transcriptional regulatory sequence operabLy 
linked to an unpaired splice donor site. 

1 1 . The vector construct of any one of claims 1 , 8, or 1 0, wherein said 
first transcriptional regulatory sequence or said second transcriptional regulatory 
sequence is a promoter. 
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12. The vector construct of claim 1 1, wherein said promoter is selected 
from the group consisting of a CMV immediate early gene promoter, an SV40 T 
antigen promoter, a tetracycline-inducible promoter, and a P-actin promoter. 

13. The vector construct of any one of claims 5-7 or 9, wherein said 
transcriptional regulatory sequence is a promoter. 

14. The vector construct of claim 1 3 , wherein said promoter is selected 
from the group consisting of a CMV immediate early gene promoter, an SV40 T 
antigen promoter, a tetracycline-inducible promoter, and a p-actin promoter. 

15. The vector construct of any one of claims 8-10, wherein said 
selectable marker is selected from the group consisting of a neomycin gene, a 
hypoxanthine phosphribosyl transferase gene, a puromycingene, a dihydrooratase 
gene, a glutamine synthetase gene, a Mstidine D gene, a carbamyl phosphate 
synthase gene, a dihydrofolate reductase gene, a multidrug resistance 1 gene, an 
aspartate transcarbamyiase gene, a xanthine-guanine phosphoribosyl transferase 
gene, an adenosine deaminase gene, and a thymidine kinase gene. 

16. A vector construct comprising: 

(a) a positive selectable marker; 

(b) a negative selectable marker; and 

(c) an unpaired splice donor site, 

wherein said positive and negative selectable markers and said splice donor site are 
oriented in said vector construct in an orientation that results in expression of said 
positive selectable marker in active form, and either non-expression of said 
negative selectable marker or expression of said negative selectable marker in 
inactive form, when said vector construct is integrated into the genome of a 
eukaryotic host cell in such a way that an endogenous gene in said genome is 
activated . 
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17. The vector construct of claim 16, wherein said positive selection 
marker and said negative selection marker both lack a polyadenylation signal. 

18. The vector construct of claim 16, wherein said positive selection 
marker is selected from the group consisting of a neomycin gene, a hypoxanthine 
phosphribosyl transferase gene, a puromycin gene, a dihydrooratase gene, a 
glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase 
gene, a dihydrofolate reductase gene, a multidrug resistance 1 gene, an aspartate 
transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase gene, and 
an adenosine deaminase gene. 

19. The vector construct of claim 16, wherein said negative selection 
marker is selected from the group consisting of a hypoxanthine phosphribosyl 
transferase gene, a thymidine kinase gene, and a diphtheria toxin gene. 

20. A eukaryotic host cell comprising the vector construct of any one 
of claims 1, 5-10, or 16. 

2 1 . The eukaryotic host cell of claim 20, wherein said cell is an animal 

cell. 



22. The eukaryotic host cell of claim 21, wherein said animal cell is 
selected from the group consisting of a mammalian cell, an insect cell, an avian 
cell, an annelid cell, an amphibian cell, a reptilian cell, and a fish cell. 

20 23. The eukaryotic host cell of claim 21, wherein said animal cell is a 

mammalian cell. 



24 . The eukaryotic host cell of claim 23 , wherein said mammalian cell 
is a human cell. 
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25. The eukaryotic host cell of claim 20, wherein said cell is a plant 

cell. 

26. The eukaryotic host cell of claim 20, wherein said cell is a fungal 

cell. 

5 27. The eukaryotic host cell of claim 26, wherein said fungal cell is a 
yeast cell. 

28 . The eukaryotic host cell of claim 2 1 , wherein said cell is an isolated 

cell. 

29. The eukaryotic host cell of claim 21, wherein said vector construct 
10 is integrated into the genome of said host cell 

30. A primer molecule comprising a PCR-amplifiable sequence and a 
degenerate 3' terminus, wherein said primer molecule has the structure: 

5'-(dT) a -X-N b -TTTATT-3', 
wherein a is a whole number from 1 to 100, X is a PCR-amplifiable sequence 
15 consisting of a nucleic acid sequence of about 10-20 nucleotides in length, N is 

any nucleotide, and b is a whole number from 0 to 6. 

31. The primer molecule of claim 30, wherein said PCR-amplifiable 
sequence comprises one or more restriction sites. 

32. The primer molecule of claim 30, wherein a is a whole number 



33. The primer molecule of claim 30, wherein said primer molecule 
comprises one or more hapten molecules conjugated to one or more bases of said 
primer molecule. 
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34. The primer molecule of claim 33, wherein said hapten molecules 
are selected from the group consisting of biotin, digoxigenin, an antibody, an 
enzyme, lipopolysaccharide, apotransferrin, ferrotransferrin, insulin, a cytokine an 
extracellular matrix protein, an integrin, ankyrin, C3bi, fibrinogen, spectrin, a 
cytokine receptor, an insulin receptor, a transferrin receptor, polymyxin B, 
endotoxin-neutralizing protein (ENP), an enzyme-specific substrate, protein A, 
protein G, a cell-surface Fc receptor, an antibody-specific antigen, an antibody- 
specific peptide, avidin, and strep tavidin. 



3 5 . The primer molecule of claim 3 3 , wherein said hapten molecule is 

biotin. 



36. A method for first strand cDNA synthesis comprising: 

(a) annealing a primer of claim 30 to an RN A template molecule to 
form an primer-RNA complex; and 

(b) treating said primer-RNA complex with reverse transcriptase and 
one or more deoxynucleoside molecules under conditions favoring 
the reverse transcription of said primer-RNA complex to 
synthesize a first strand cDNA. 

37. A method for isolating an activated gene from a host cell genome, 
comprising: 

(a) introducing a vector comprising a transcriptional regulatory 
sequence, a vector-encoded exon, an unpaired splice donor site, 
and a vector-encoded intron into a host cell; 

(b) allowing said vector to integrate into the genome of said host cell 
by non-homologous recombination, under conditions such that 
said vector activates an endogenous gene in said genome; 

(c) isolating RNA from said host cell; 

(d) synthesizing first strand cDNA by reverse transcription of said 
isolated RNA; 
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(e) annealing a primer specific for said vector-encoded exon to said 
first strand cDNA to create a primer-first strand cDNA complex; 
and 

(f) contacting said primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second 
strand cDNA product substantially complementary to said first 
strand cDNA. 

38. A method for isolating an activated gene from a host cell genome, 
comprising: 

(a) introducing a vector comprising a transcriptional regulatory 
sequence, a vector-encoded exon, an unpaired splice donor site, 
and a vector-encoded intron into a plurality of host cells; 

(b) allowing said vector to integrate into the genomes of said host 
cells by non-homologous recombination, under conditions such 
that said vector activates an endogenous gene in said genomes; 

(c) cultivating said host cells under conditions favoring the production 
of a plurality of individual clones from said host cells, wherein 
each of said individual clones in said plurality of clones contains 
said vector integrated into a unique site in said host cell genome; 

(d) isolating RNA from said plurality of clones; 

(e) synthesizing first strand cDNA by reverse transcription of said 
isolated RNA; 

(f) annealing a first primer specific for said vector-encoded exon to 
said first strand cDNA to create a primer-first strand cDNA 
complex; and 

(g) contacting said primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second 
strand cDNA product substantially complementary to said first 
strand cDNA. 
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39. The method of claim 37, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded exon. 

40. The method of claim 38, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded exon. 

41 . The method of claim 37, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded intron downstream of said unpaired splice donor 
site. 

42. The method of claim 38, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded intron downstream of said unpaired splice donor 
site. 

43 . The method of claim 3 7, further comprising amplifying said second 
strand cDNA product using a second primer specific for said vector-encoded exon 
and a third primer specific for said first primer. 

44. The method of claim 3 8, further comprising amplifying said second 
strand cDNA product using a second primer specific for said vector-encoded exon 
and a third primer specific for said first primer. 

45 . An isolated gene produced according to the method of any one of 
claims 37-44. 

46. A host cell comprising the isolated gene of claim 45. 
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47. A vector comprising the isolated gene of claim 45. 



48. The vector of claim 47, wherein said vector is an expression vector. 



49. A method of producing a polypeptide, comprising: 

(a) introducing the vector of claim 47 into a host cell; and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide encoded by said isolated gene. 



50. The method of claim 49, further comprising isolating said 
polypeptide. 



51. A polypeptide produced according to the method of claim 49 or 
claim 50. 



52. A method of producing a polypeptide, comprising: 

(a) introducing into a host cell a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position -3, or 
at an increment of 3 bases upstream therefrom, from the 5'-most base of said splice 
donor site. 
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53. A method of producing a polypeptide, comprising: 

(a) introducing into a host cell a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position -2, or 
at an increment of 3 bases upstream therefrom, from the 5'-most base of said splice 
donor site. 

54. A method of producing a polypeptide, comprising: 

(a) introducing into a host cell a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position - 1 , or 
at an increment of 3 bases upstream therefrom, from the 5-most base of said splice 
donor site. 

55. The method of any one of claims 52-54, further comprising 
isolating said polypeptide. 
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56. A polypeptide produced by any one of claims 52-54. 

57. A polypeptide produced fay the method of claim 55. 
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Compositions and Methods for Non-targeted Activation of 
Endogenous Genes 

Abstract 

The present invention is directed generally to activating gene expression 
5 or causing over-expression of a gene by recombination methods in situ. The 

invention also is directed generally to methods for expressing an endogenous gene 
in a cell at levels higher than those normally found in the cell. In one embodiment 
of the invention, expression of an endogenous gene is activated or increased 
following integration into the cell, by non-homologous or illegitimate 

10 recombination, of a regulatory sequence that activates expression of the gene. In 

another embodiment, the expression of the endogenous gene may be further 
increased by co-integration of one or more amplifiable markers, and selecting for 
increased copies of the one or more amplifiable markers located on the integrated 
vector. In another embodiment, the invention is directed to activation of 

15 endogenous genes by non-targeted integration of specialized activation vectors, 

which are provided by the invention, into the genome of a host cell. The invention 
also provides methods for the identification, activation, isolation, and/or 
expression of genes undiscoverable by current methods since no target sequence 
is necessary for integration. The invention also provides methods for isolation of 

20 nucleic acid molecules (particularly cDNA molecules) encoding a variety of 

proteins, including transmembrane proteins, and for isolation of cells expressing 
such transmembrane proteins which may be heterologous transmembrane proteins. 
The invention also is directed to isolated genes, gene products, nucleic acid 
molecules, to compositions comprising such genes, gene products and nucleic acid 

25 molecules, and to vectors and host cells comprising such genes and nucleic acid 

molecules, that may be used in a variety of therapeutic and diagnostic 
applications. Thus, by the present invention, endogenous genes, including those 
associated with human disease and development, may be activated and isolated 
without prior knowledge of the sequence, structure, function, or expression profile 

30 of the genes. 
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5VVGATCTTCAATA1TGGCCATT^^ 

AATATTGGCTATTGGCCA1TGCATA 

CGTTOTATCTATATCAT^ 

CCATGTrGGCATTGATTATTGACT 

AGTTATTAATAGTAATCAATTACGGG^^^ 

TCCGCGTTACATAACTTACGGTAAA 

TCGCCCGCCTGGCTGACCGGCCAACGACCCCCGCCCATlXrACGTCAATAATCACG 
TATGTTCCCATAGTAACGrCCAATAG 
GGACITTCCATTGACGTCAATGGGIXXJAGTATC^ 
AGTAC ATC AAGTGTATCATATGCCA 

AGTCCGCCCCCTATTCACGTCAATGACGGTAAATGGCe<^CCIX?GCATl^ATGCCC 
AGTACATGACCTTAOKKJACTTTCC 

TACTTGGCAGTACATCTACGTATTAGTCATCGCTATrACCA 
GGCAGtACACCAATGK3GCGTX5GAT 
AGCGGTTTGACTCACGGGGATTTCCAA 
TITGTTTTGGCACCAAAATCAACGG 

GACTTTCCAAAATGTCGTAACAACTGCGATCGCCCGCCCCGTTGACGCAAATGfGG 
CGGTAGGCGTGTACGGTGGGAGGTC 

TATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGAAGCTTTATTGCGG 
TAGTTTATCACAGTTAAATTGCTAA 

CGCAGTCAGTGCTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCTT 
AATTAACTCCACCAGTCTCACTTCA 

GTTCCTTTTGCCTCCACCAGTCTCACTTCAGTTCCTTTTGCATGAA 
TC/ukAAGAGGAAACCAACCCCTAA 
GATGAGCTTTCCATGTAAATTTGTAGCCAGCCT^ 
CCAAAGGTGCAGTCTCCAAAGAGA 

TTACGAATGCCTTGGAAACCTGGGGTGCCTTGGGTCAGGACATCAACTTGGACAT 
TCCTAGTTTTCAAATGAGTGATGAT 

ATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTCA 
GAAAAGAGAAAGAGACTTTCAAGGA 

AAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAAG 
ACCGATGATCAGGATATCTACAAGG 

TATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTTGAA 
GATTCAAGAGAGGGTCTCAAAACCA 

AAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGAA 
CTGACCCCGAATTAAACCTGTATCA 

AGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCACC 
AGCCTGAGTGCAAAATTCAAGTGCA 

CAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCAG 
AGAAAGGGATCCAGGTGAGTAGGGCC 

CGATCGTTCTAGAGTCGAGCTCTCTTAAGGTAGCAAGGTTACAAGACAGGTTTAA 
GGAGACCAATAGAAACTGGGCTTGT 

CGAGACAGAGAAGACTCTTGCGTTTCTGATAGGCACCTAITGGTCTTACGCGGCC 

GCGAATTCCAAGCTTGAGTATTCTA 

TCGTGTCACCTAAATAACTTGGC^^ 

ATTGTTATCCGCTCACAATTCCACA 

CAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAG 
CTAACTCACATTAATTGCGTTGCGCGATG<nTCCATTTTGTGAGGGTTAATGC- 
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1TCGAGAAGACA1X5ATAAGATACATTC 

GCAGTGA AAAAAATGCTTTATT1XtTGAAATTTGTGATG 

CCATrATAAGCTGCAATAAACA 

AGTTAACAACAACAAl'TGCA'l'rCATrrrATGlTJGAGG'ITCAGGGGGAGAl^GTGG 
GAGGTTTTTTAAAGCAAGTAAAACC " 
TCTACAAATGTGGTAAAATCGGATAAGGATCGATTCCGGAGCCTGAATGKK^GAAT 
GOACGCGCCCTOTAGdGGCGCATTA 

AGCGCGGCGGGTGTGGTCGTTACGCGCACGTCACCGCTACACITLTjCCAGCGCCC 

TAGCGCCCGCTCCITTCGCTlTCriTC 

CCTTCCTTTCTCGCCACGTTCGCCGGC 

TCCCTTTAGGGTreGGATITAGTGC 

TITACGGCACCTCGACCCCAAAAAACriXMTTAG 

CCATCGCCCTGATAGACGGTTTTTC 

GCCCTTTGACG1TGGAGTCCACGTTCTTTAATAGTCGACTC1TGTTCCAAACTGG 
AACAACACTCAACCCTATCTCGGTC 

TATTC 1 1 JL I GATTTATAAGGGAT1 l^GCCG AlTTCGGCCrAl^rGGll A A A A A ATG A 
GCTGATTTA AC AAAAATTTAACGC . * • 

GAATTTTAACAAAATATTAACGCTTACAATTTCGCCTGTGTACCTTCTGA 
AAAGAACCAGCTGTGGAATGTGTGT 

CAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGC 
ATGCATCTCAATTAGTCAGCAACCAG 

GTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCT 
CAATTAGTCAGCAACCATAGTCCCGC 

CCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCC 
CCATGGCTGACTAATTTTTTTTATT 

TATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGA 
GGCTTTTTTGGAGGCCTAGGGTTTTG 

CAAAAAGCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCACCA 
TGATTGAACAAGATGGATTGCACGC 

AGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAG 
ACAATCGGCTGCTCTGATGCCGCCG 

TGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTC 
CGGTGCCCTGAATGAACTGCAGGAC 

GAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTG 
CTCGACGTTGTCACTGAAGCGGGAAG 

GGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTT 
GCTCCTGCCGAGAAAGTATCCATCA 

TGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGA 
CCACCAAGCGAAACATCGCATCGAG 

CGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGAA 
GAGCATCAGGGGCTCGCGCCAGCCGA 

ACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTGAC 
CCATGGCGATGGCTGCTTGCCGAATA 

TCATGGTGGAAAATGGCCGCTTTTCTGGATTCATCGACTGTGGCCGGCTGGGTGT 
GGCGGACCGCTATCAGGAGATAGCG 

TIXKjCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGC^ 
TCGTGCTTTACGGTATCGCCGCTCC 

CGAlTCGCAGCGCATCGCCITCrATCGCCrrGriGACGAGTTCrrCTGAGCGGGA 
CTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGATGGC- 
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S^^P^^TATCI^ 
1 CCGCGTA- 

TCaTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGAC 
ACCCGCCAACAC 

CCGCrGACGCGCCCrGACGGGO'I^Gl'CTGCTCCCGGCATCCGCrrrACAGACAAGC 
TGTGACCGTCTCCGGGAGCTGCATG 

TGTCAGAGGTlTrCACCGTCATCACCGAAACGCGCGAGACGAAAGGGCCrCGTGA 
TACGCCTATTTITATAGGTTAAIXTT 

CATGATAATAATCGlTl^CTTAGACGTCAGGTGGGAOTlTCGGGGAAAlXjTGCGC 
GGAACCCCTATTTO1TTAT1TTTC1" 

AAATACAlTCAAATAimATCCGCTCATCAGACAATAACCCTGATAAATGCTTCA 
ATAATATlXrAAAAAGGAAGAGTATG 

AGTATTCAACAlXrCCGTOTCGCCCITAlTrCCCrnTITGCGGCAlTITGCClTCC 

TGTnTTGCTCACCCAGAAACGCT 

(KxTGAAAGTAAAAGATG^HGAAGAT^^ 

ACTGGATCTCAACA'GCGGTAAGATCC 

TTGAGAGTTTTCGCCCCGAAGAACGTTTTCCA 

GCTATGTGGCGCGGTATTATCCCGT 

ATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACT 
TGGTTGAGTACTCACCAGTCACAGA 

AAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACC 
ATGAGTGATAACACTGCGGCCAACT 

TACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACAT 
GGGGGATCATGTAACTCGCCTTGAT 

CGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACG 
ATGCCTGTAGCAATGGCAACAACGTT 

GCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATA 
GACTGGATGGAGGCGGATAAAGTTG 

CAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATC 
TGGAGC CGGTG AGCGTGGGTCTCGC 

GGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCT 
ACACGACGGGGAGTCAGGCAACTAT 

GGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGG 
TAACTGTCAGACCAAGTTTACTCAT 

ATATAClTrrAGArTGATTTAAAACTTCATnTTAATlTAAAAGGATCTAGGTGAAG 
ATCCTTTTTGATAATCTCATGACC 

AAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGA 
TCAAAGGATCTTCTTGAGATCCTTT 

TTTTCTGCGCGTAATCTGGTGGTrGGAAACAAAAAAACCACCGCTACCAGCGGTG 

GTTTGTTTGCCGGATCAAGAGCTAC 

CAACTCTTTTTCCGAAGGTAACTGG^ 

CCTTCTAGTGTAGCCGTAGTTAGGC 

CACCACTTCAAGAACTCTGTAGGACCGGCTACATACCfCGGTCTGCTAATCCTGT 
TACCAGTGGCTGCTGCCAGTGGCGA 

TAAGTCGTGTCTTACCGGGTTGGACrCAAGACGATAGTTACCGGATAAGGCGCAG 
CGG'rCGGGCTGAACGGGGGGTTCGT • 

GCACACAGCCCAGaiTGGAGGGAACGACCTACACCGAACTGAGATACCTACAGC 
GTGAGCTATGAGAAAGCGCCACGCTT 

CCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGG- 
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AGAQCGCACGAGGGAGGTTCCAGGGGGAAA<XKXJTGCT 

GGGTlTCGCCACCTCTGACTl^GAGCGTCGATri^GTGATGCTCGTCAGGGG 

GGCGGAGCCTATGGAAAAACGCCAGGAACGCGGCCT1TT1ACGG1TCCTGGCCTT 

TTGCTGGCCTl-rrGCl^CACATGGCT 

CGAC3' 
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5*AGA1XT1^CAATATTCGCCATTA 

AATATTGGCTATTGGCCATTGCAT 

ACGITCTATCTATATCATAATATGTACA^ 

GCCATG1TGGCATTGA1TATTGAC 

TAGTTATTAATAGTAATCAATTACG^ 

TTCCG CGTTAC ATA ACITACGGTA A 

ATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGAGGTCAATAATGAC 
GTATGTTCCCATAGTAACGCCAATA 
GGOACTTTCCATTOACGTCAATGKKH^ 
CAGTACATCAAGTGTATCATATGCC 

AAGTCCGGCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCA1TATGCC 
CAGTACAIXjACCTTACGGGACTTTC 

GTACTTGGCAGTACAlXrrACGTATTAGTCATCGCTATlACCATGGTGATGCGGlT 

TIX3GCAGTACACCAATG<JCrcGTGGA 

TAGCGGTTTGACTCACGGG^^ 

GT1TGTTTTGGCACCAAAATCAACG 

GGACTTTCCAAAATGTCGTAACAACTG€GATCGCCCG€CCCGTTGACGCAAATGG 
GGGGTAGGCGTGTACGGTGGGAGGT 

CTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGAAGCTTTATTGCG 
GTAGTTTATCACAGTTAAATTGCTA 

ACGCAGTCAGTGCTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCT 

TAATTAACTCCACCAGTCTCACTTC 

AGTTCCTTTTGCCTCCACCAGTCTCACTTCAG 

ATCAAAAGAGGAAACCAACCCCTA 

AGATGAGCTTTCCATGTAAATTTGTAGCCAG^ 

TCCAAAGGTGCAGTCTCCAAAGAG 

ATTACGAATGCCTTGGAAACCTGGGGTG<:CTTGGGTCAGGACATCAACTTGGACA 
TTC CT AGTTTTC AAATGAGTGATG A 

TATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTC 
AGAAAAGAGAAAGAGACTTTCAAGG 

AAAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAA 
GACCGATGATCAGGATATCTACAAG 

GTATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTTGA 
AGATTCAAGAGAGGGTCTCAAAACC 

AAAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGA 
ACTGACCCCGAATTAAACCTGTATC 

AAGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCAC 
CAGCCTGAGTGCAAAATTCAAGTGC 

ACAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCA 
GAGAAAGGGATCCCAGGTGAGTAGGG 

CCGGATCCTTCTAGAGTCGAGCTCTCTTAAGGTAGGAAGGTTACAAGACAGGTTT 
AAGGAGACCAATAGAAACTGGGCTT 
GTCGAGACAGAGAAGACTCTTGGGTTTCTG^ 
CCGCGAATTCCAAGCTTGAGTATTC 

TATCGTGTCACCTAAATAACTTGGCGTAATCATGGTCATATCTGTTT^CCTGTGTGA 
AATTGTTATCCGCTCACAATTCCA 

cacaacatacga-gccggaagcataaagtgtaaagccrggggtgcctaatgagtg 
agctaactcacattaattgcgttgcg 

cgatg<;ttccattttgtgagggttaatgcttcgagaagacatgataagatacatt 
gatgagtttggacaaaccacaacaagaatggagtgaaaaaaatgctttatttgt- 
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gaaatctctgatgctattGcttta^ 

caaglxaacaacaacaattgcattcatttrratgtttcaggttcaggggoagatgt 
gggagg 1 '1'1"1"1taaagcaagtaaaa 

CCTCTACAAATGTGGTAAAATCCGATAAGGATCGAITGCGGAGCGIXjAATGGCGA 
ATGG ACGCG CCCTGTAGCGGCGGAT 

TAAGCGCGGCGGGTGTGGTGGTTACGCGGACGTGACCGCTACACTTGCCAGCGC 
CCTAGCGGCCGCTCCITTCGCTTTCT 

TCCCT^TCCTTTCTCGCCACGITTCGCCGGCTITCCCCGTCAAGCTCTAAATCGGGG 

GCTCCCITTAGGGTTCCGA'ITTAGT 

GC1TTACGGCACCT(^ACCCGAAAAAACITGA^ 

GGCCATCXK^CCTOAtAGAOSGTm 

TCGCCCTTTGACGTTGGAGTCCAGGTTCT1TAATAGTC 

GAACAACACTCAACCCTATCTCGG 

TCTATTCTTTTX^TTTATAAG^ 

GAGCTGATTTAACAAAAATTTAAC 

GGGAATI^AACAAAATATTAACGCITACAATTTCGCCTGTGTACCI'I^CTGAGGC 
GGrAAAGAACCAGCTGTGGAATGTGT 

GTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAA 
GCATGCATCTCAATTAGTCAGCAACC 

AGGTGTGGAAAGTCCCCAGGCTCCCCAGeAGGCAGAAGTATGCAAAGCATGCAT 
CTCAATTAGTCAGCAACCATAGTCCC 

GCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCG 
CCCCATGGCTGACTAATTTTTTTTA 

TTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGG 
AGGCTTTTTTGGAGGCCTAGGCTTT 

TGCAAAAAGCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCAC 
CATGATTGAACAAGATGGATTGCAC 

GCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAAC 
AGACAATCGGCTGCTCTGATGCCGC 

CGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTG 
TCCGGTGCCCTGAATGAACTGCAGG 

ACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTG 
TGCTCGACGTTGTCACTGAAGCGGGA 

AGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACC 
TTGCTCCTGCCGAGAAAGTATCCAT 

CATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTC 
GACCACCAAGCGAAACATCGCATCG 

AGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACG 
AAGAGCATCAGGGGCTCGCGCCAGCC 

GAACTGTTCGCCAGGCTCAAGGCGCGGATGCCCGACGGCGAGGATCTCGTCGTG 
ACCCATGGCGATGCCTGCTTGCCGAA 

TATCATGGTGGAAAATGGCCGCTTTTCTGGATTGATCGACTGTGGCCGGCTGGGT 

GTGGCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATIGCTGAAGAGC 

TrGGGGGGGAATGGGCTGACCGCTTCCTGGTGCTTTACGGTATCGCCGCr 

CCCGATTCG^AGGGCATCGCCCT^ 

GACTCTGGG GTTCGAAATGACCGAC 

CAAGCGACGCCCAAGCTGCCATCACGATGGGCGGAATAAAATATCTTTATTTTCA 
TTACATCTGTGTGTTGGTTTTTTGT 

GTGAAGATCCGCGTATGGTGCACTCTCAGTACAATCTGGTCTGATGCCGCATAGT 
TAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGGT- ' 
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TGHDGTCAGAGGTTTTCACCGTC 
GATACGCCTA1TTTTATAGGTTAAT 

GTCATCATAATAATGOl^TTCTTAGACGTCAGGTGGCACTri^CGGGGAAATGl^GC 
GCGGAACCCCTAlTTGriTAlTTIT 

CTAAATACATTCAAATATGTATCCGCTCATGAGACAATA 
CAATAATATTGAAAAAGGAAGAGTA 
TGAGTATTCAACATTTCCGTC^ 
CCTGTmTGCTCACCCAGAAACXJ 

CTGGTCAAAGTAAAAGATGCTGAAGATCAGTl'GGGTCC^ 
GAACTGGATCTCAACAGCGGTAAGAT ' 
CCTTXxAGAGTTITCGCCCeGAAGAACGTITTC 
CTGCTATGTGGCGCGGTATTATCCC 

GTATTGACGGCGGGCAAGAGCAACTCGGTCGCCGCATACAClATrCTCAGAATGA 
CTTGGTTGAGTACTCACCAGTCACA 

GAAAAGCATCl^ACGGATCGCATGACAGTAAGAGAATrATGCAGTCCTGCeATAA 
CCATGAGTGATAACACTGCGGCCAA " 

CTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTT^ 
ATGGGGGATCATGTAACTCGCCTTG 

ATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCA 
CGATGCCTGTAGCAATGGCAACAACG 

TTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGKjCAACAATIAA 
TAGACTGGATGGAGGCGGATAAAGT 

TGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAA 

tctggagccggtgagcgtgggtctc 

gcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttat 
ctacacgacggggagtcaggcaact 

atggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcatt 
ggtaactgtcagaccaagtttactc 

ATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTA^ 
AGATCCTTTTTGATAATCTCATGA 

CCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA 
GATCAAAGGATCTTCTTGAGATCCT 

TTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGG 
TGGTTTGTTTGCCGGATCAAGAGCT 

ACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACT 
GTCCTTCTAGTGTAGCCGTAGTTAG 

GCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCT 

GTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCA 

AGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC 

GTGCACACAGGCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATAGCTACA 

GCGTGAGCTATGAGAAAGCGGCACGC 

TTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAG 
GAGAGCGCACGAGGGAGCTTGCAGGG 

GGAAACGCCTGGTATCTriATAGTCCrGTCGGGT T lTCGCCACC'rGfGAClTGAGC 

gtcgatttttgtgatgctcgtcagg 
gkkxtcggagcctatggaaaaacgc^^ 

CTTTTGGTGGGCTTTTGCTCACATGG 
CTCGAC3' 
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5'AGATCTrCAATAlTG<JCCATT^ 
AATATTXK3CTATTGGCCATTGCAT 

ACGTIWATCTATATCATAATATGTACATI^ATAITXJGCI^ATGTCCAATATGACC 

GCCATGTTXKK^ATTGATTATTOAC 

TAGTTATTAATAGTAATCAATTACGGGGTCA 

TTCCGCGTTACATAACTTACGGTAA 

ATCKKX^CGCCTGGCTGACCGCC^^ 

GTATGTTCCCATAGTAACGCCAATA 

G<3GACTTTXIX3ATTCfACGT^ 

CAGTACATCAAGTGTATCATATGCC 

AAGTCCGCCCCCTAITGACGTCAATGAC^ 

C^GTACATGACCTTAOKKJACITTC 

CTACTrGGCAGTACATCTACGTATTAGTCATCGCrAITACCA'rGGTGATGCGGTF 
TIXK3KIAGTACACCAATGGGCGTGGA - 
TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAA^ 
GTTTXjTTTTGGCACCAAAATCAACG 

GGACTTTCCAAAATGTCGTAACAACTGCGATCGCCCGCCCCGTTGACGGAAATGG 
GCGGTAGGCGTGTACGGTGGGAGGT 
CTATATAAGCAGAGCTCGTTTAGTGAAC^^ 
GTAGTTTATCACAGTTAAATTGCTA 

ACGCAGTCAGTGCTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCT 

TAATTAACTCCACCAGTCTCACTTC 

AGTTCCTTTTGCCTCCACCAGTC^^ 

ATCAAAAGAGGAAACCAACCCCTA 

AGATGAGCTTTCCATGTAAATTTGT^^ 

TCCAAAGGTGCAGTCTCCAAAGAG 

ATTACGAATGCCTTGGAAACCTGGGGTGCCTTGGGTCAGGACATCAACTTGGACA 
TTCCTAGTTTTCAAATGAGTGATGA 

TATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTC 
AGAAAAGAGAAAGAGACTTTCAAGG 

AAAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAA 
GACCGATGATCAGGATATCTACAAG 

GTATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTTGA 
AGATTCAAGAGAGGGTCTCAAAACC 

AAAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGA 
ACTGACCCCGAATTAAACCTGTATC 

AAGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCAC 
CAGCCTGAGTGCAAAATTCAAGTGC 

ACAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCA 
GAGAAAGGGATCCACAGGTGAGTAGG 

GCCCGATCCrrCTAGAGTCGAGCTCTCITAAGGTAGCAAGG'rrACAAGACAGGTT 
TAAGGAGACCAATAGAAACTGGGCT 

TGTCGAGACAGAGAAGACTCTTGCGTTTCTGATAGGCACCTATTGGTCTTACGCG 
GCCGCGAATTCCAAGCTTGAGTATT 

CTATCGTGTCACCTAAATAACTTGGCGTAATCATGGTCATATCT 
AAATTGTTATCCGCTCACAATTCC 

ACACAACATACGAGGCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGT 
GAGCTAACTCACATTAATTGCGTTGC 

GCGATGCTTCCATTTTGTGAGGGTTAATGCTTCGAGAAGACATGATAAGATACAT 
TGATGAGTTTGGACAAACCACAACAAGAATGCAGTGAAAAAAATGC- 
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TTTATTTGTGAAATTTG'^GrATG 

CTA1TGC1^ATTTGTAACCA1T:ATAAGC1X3CAATAA 

ACAAGTTAACAACAACAATTGCATTCATl^T'ATGTTTCAGGl^TCAGGGGGAGATG 
TGKKJAGGTTTTTTAAA.GCAAGTAAA 

ACCTCTACAAATGTGGTAAAATCCGATAAGGATCGATTCCGGAGGCTGAATGGCG 
AATGGACGCGCCCTGTAGCGGCGCA 

TrAAGCGCGGCGGGTGlXKTTGGITACGCGCACGTGACCGCrACACrrGCCAGCGC 
CCTAGCGCCCGCTCCrTTCGCTTrC 

TTCCCTTCC1TTCTCGCCACGTTCGCCGGC1TTCCCCGTCAAGCTCTAAATCGGGG 
GCTCCCTTTAGfGGTTCCGATTTAG 

TGCTTTACGGCACCTCGACCCCAAAAAAClTGA'lTAGGGIXrATGG^CACGTAGT 

GGGCCATCGCCCTGATAGACG<5TTT 

TTCGCCCrTTGACGT^^ 

GGAACAACACrCAACCCTATGTCG 

GTCTATl'ClTTl'GAlTTATAAGGGATTTTGCCGATITCGGCClAl^rGGTIAAAAAA 
TGAGCTGATTTAACAAAAATTTAA 

CGGGAATTTTAACAAAATATTAACGCTTACAATTTCGCCTGTGTACCTTCT 
CGGAAAGAACCAGCTGTGGAATGTG 

TGTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAA 
AGCATGCATCTCAATTAGTCAGCAAC 

CAGGTGTGGAAAGTGCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCA 
TCTCAATTAGTCAGCAACCATAGTCC 

CGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCC 
GCCCCATGGCTGACTAATTTTTTTT 

atttatgcagaggccgaggccgcctcggcctctgagctattccagaagtagtgag 
gaggcttttttggaggcctaggctt 

ttggaaaaagcttgattcttctgacacaacagtctcgaacttaaggctagagcca 
ccatgattgaacaagatggattgca 

cgcaggttctccggccgcttgggtggagaggctattcggctatgactgggcacaa 
cagacaatcggctgctctgatgccg 

ccgtgttccggctgtcagcgcagggg<:gcccggttctttttgtcaagaccgacct 
gtccggtgccctgaatgaactgcag 

gacgaggcagcgcggctatcgtggctggccacgacgggcgttccttgcgcagct 
gtgctcgacgttgtcactgaagcggg 

aagggactggctgctattgggcgaagtgccggggcaggatctcctgtcatgtcac 
cttgctcctgccgagaaagtatcca 

tcatggctgatggaatgcgggggctgcatacgkntgatccggctacctggccatt 
cgaccaccaagcgaaacatcgcatc 

gagcgagcacgtactcggatggaagccggtcttgtcgatcaggatgatctggac 
gaagagcatcaggggctcgcgccagc 

CGAACTGTTCGGCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTeGT 
GACCCATGGCGATGGCTGCTTGCCGA 

ATATCATGGTGGAAAATGGCCGCTTTTCTGGATTCATCGACTGTGGCCGGG 
TGTGGCGGACCGCTATCAGGACATA 

GCGTTGGCTACCCGTGATATTGCTGAAGAGCrTGGCGGCGAATGGGCTGACCGCT 
TCCTCGTGCTTTACGGTATCGCCGC 

TCCCGArrCGCAGGGGATCGCClTCTATCGCC'rTCTTGACGAG'rrC'nGTGAGCG 
GGACTCTGGGGTTCGAAATGACeGA 

CCAAGGGACGCCCAACCTGCCATCACGATGGCCGGAATAAAATATCTTTATTTTC 
ATTACATCTGTGTGTTGGTTTTTTGTGTGAAGATCCGCGTATGGTGCACTCTC- 
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AGTACAATCTGCTaraATGCCGOA^TAGTTAAGCCAGCCCCGACACCCGCCAA 

CACCCGGTX3rACG<XKXXH*j^ 

AGCTGTGACCGTCTCCGGGAGCTGC 

ATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGACGAAAGGOCCTCG 
TGATACGCCTATTTTTATAGGTTAA " 

TGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCG 

CGCGGAACCCCTATTTCTTTATTTT 

TCTAAATACATTCAAATATXHA^^ 

TCAATAATATTGAAAAAGGAAGAGT 

ATCAGTATTCAACATTTCCGTGTCGCC 

TCCTGTTTTTGCTCACCCAGAAAC 

GCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACAl^ 

cgaactggatctcaacagcggtaaga 

tccttgagagttttcggcccgaagaacgt^ 

tctgctatgtggcggggtattatcc 

cgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatg 
acttggttgagtactcaccagtcac 
agaaaagk:atcitacggatggcat^^ 
accatgagtgataacactgcggcca 

acttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaa 
catgggggatcatgtaactcgcctt 

gatcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacacc 
acgatgcctgtagcaatggcaacaac 

gttgcgcaaactattaactgkk:gaactacttactctagcttcccggcaacaatta 
atagactggatggaggcggataaag 

ttgcaggaccacttctgcgctcggcccttccggctggctggtttattgctgataa 
atctggagccggtgagcgtgggtct 

cgcggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagtta 
tctacacgacggggagtcaggcaac 

tatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcat 
tggtaactgtcagaccaagtttact 

catatatactttagattgatttaaaacttcatttttaatttaaaaggatc 
aagatcctttttgataatgtcatg 

accaaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaa 
agatcaaaggatcttcttgagatcc 

tttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcg 
gtggtttgtttgccggatcaagagc 

taccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatac 
tgtccttctagtgtagccgtagtta 

ggccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcc 
tgttaccagtggctgctgccagtgg 

cgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcg 
cagcggtcgggctgaacggggggtt 

cgtgcacacagcccagcttggagcgaacgacctacaccgaactgagatacctac 
agcgtgagctatgagaaagcgccacgc1tcccgaagggagaaaggcggacaggt 
atccggtaagcggcagggtcggaacaggagagcgcacgagggagcitccagg 

GGGAAACGGCTGGTATGIlTATAGTCCTGTCGGGIlTCGCCACCrClTjAC'riTrAG 
CGTCGATTTTTGT GATGCTCGTCAG 

GGGGGCGGAGCCTATGGAAAAACGGCAGCAACGCGGCCTTTTTACGGTTCCTGG 
CCTTTTGCTGGCCTTTTGCTCAC ATGGCTCGAC3 ' 
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AGATCTrcAATATTKKKX^T^ 
CTATTGGCCATTGCATACXnTGTATCTATAlX^^ 

ATATGACC<K:CATGnrrcGCATTGATTATTGACTAGTlATTAATAGTAATC 
^AGTTCATAGCCCATATAIXKJAGITCXXjCGTTACATAACTTACGGTAAA^ 
TGACCGCCCAACGACCCCCGCCCATTGAC<]riX^TAATGACGTATGT^ 
ATAGKjGACTTTCCATIXtACGTCAATGGG 

CATCAAGTGTATCATATGCCAAGTCCGCCCCCrAlTGACGrCAATCACGGTAAATGGCCCGCC 

TGGCATTATGCCCAGTACATGACXJITACXKK5ACTTTCCT 

GTCAT03CTATTACX^TG<3TCATG<^^ 

GACTCACGGGGATlTCCAAGTCTCCACCCCATTGAanX^ 

AATCAACGGGACTTTCCAAAATGTCX^ 

GGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTC^ 

C^CTAGAAGCITTATTGCGGTAGTTTATCACAGTTAAA^ 

C^CAACAGTCTXXxAAClTAAGCTGC^GTGACn^ 

GCGCTATATGCGTTGATGCAATTTCTATOK^ 

GGCCGCCGCCCAGTCCTGCKXiC^^ 

ACCACACCCGTCi2TGTGGATCX^CTACGCCGGACGCATCGTGGC<X?GCATCACCGGC^ 

GGl^GCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGATCGGGCTCGjGGA 

GGGCTCATGAGCGCTTGTTIXXjGCTCTCTTAAGGTAGC^ 

CTCATGTTTGACAGCTTATCATCGCAGATCCTGAGCTTGTATGGTGCACTCT^ 

GCTCTGCTGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGT 

AGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATrGCATGAAGAAT 

CTGCTTAGGGTTAGGGGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTATCTGA 

GGGGACTAGGGTGTGTTTAGGCGCCCAGGGGGGCITCGGTTGTACGCGGTTAGGAGTCCCCTC 

AGGATATAGTAGTTTCGCTTTTGCATAGGGAGGGGGAAATGTAGTCTTATGCAATACACTTGT 

AGTCTTGCAACATGGTAACGATGAGTTAGCAACATGCCTTACAAGGAGAGAAAAAGCACCGT 

GCATGCCGATTGGTGGAAGTAAGGTGGTACGATCGTGCCTTATTAGGAAGGCAACAGACAGG 

TCTGACATGGATTGGACGAACCACTGAATTCCGCATTGCAGAGATAATTGTATTTAAGTGCCT 

AGCTCGATACAATAAACGCCATTTGACCATTCACCACATTGGTGTGCACCTCCAAGCTGGGTA 

CCAGCTGCTAGCCTCGAGACGCGTGATTrCCTTCGA^ 



tfcaaggaacctccacaaggagctcattttctttc^^ 



tttccagaaattgatttggagaaatataaacttctgc^^ 

tgagaagaatgattaatCGATCTTAAGTTTAATCTTTCCCGGGGGTACCGTCGACTGCGGCCGCGAATTC 

CAAGCTTGAGTATTCTATCGTGTCACCTAAATAACTTGGCGTAATCATGGTCATATCTGTTTCC 

TGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTA 

AAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCGATGCTTCCATIT 

TGTGAGGGTTAATGCTTCGAGAAGACATGATAAGATACATTGATGAGTTTGGACAAACCACA 

ACAAGAATGGAGTGAAAAAAATGCTTrATTTCTGAAATITGTGATGGTAT^ " 

ACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTTATGTITCAGGTT 

CAGGGGGAGATGTGGGAGGTTTTTTAAAGCAAGTAAAACCTCTACAAATGTGGTAAAATCCG 

ATAAGGATCGATTCCGGAGCCTGAATGGCGAATGGACGCGCCCTGTAGCGGCGCATTAAGCG 

CGGCGGGTGTGGTGGTTACGCGCACGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCC 

TTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGC1TTCCCCGTC 

GGGCl"CCClT r rAGGGTTCCGATriAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAG 
GGTGATGGT'rCACGTAGTGGGCCATCGCCCTGATAGACGG'ITTTT'CGCCCTTTGACGTTGGAG 
TCCACGT^rCTlTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTC 
TATTCTTTTGATTTATAAGGGATTTTGCCGATTTC 

AACAAAAATlTAACGCGAAITTrAACAAAATATTAACGCTTACAAlTTCGCCTGTGTACCITC 
TGAGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGITAGGGTGTGGAAAGl'CCCCAGGCTC 
CCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGT 
CCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATA— 



OTCOCGCCOTAACTX^^ 

AGAAGTAGTGAGGAGGCTTTTTTC^^ 
CAACAGTCTCGAACTTAAGGCTAGAGCCA 

tctgatgccgccgtgttccgg 

ctgtccggtgco^gaatgaactgcagga(x]^ 

gggcgttccttgcgcagctgtgctcgacgttgtcactgaagcgggaagggactggct 
g«3Kkxjaagtgccggg<k:aggatci^^ 

CATGGCIXjATGCAATGCGG(XjGCTGCATACGCTTGATCCXa 

AGCGAAACATCGCATTOAGCGAGCACGTACTCGGATCGAAGCCGGTC^ 

ATCTXX^CGAAGAGC^TCAGGGG^ 

ATGCpDGAOSGCGjA^ 

CbAAAATGCKXXKHTriX^XKJATO 

GACATAGCGlTGGCTACCXiGTGATAlTGCrGAAGAGCTTGGCGGCGA ATG GGCTGACCGCTTC 

CTCtfiraTTTACGCT 

AGrraTCTGAGCGGGACTCTGGGG^ 

catoatggccgcaataaaatatcti^^^ 
atccg<otatggtgcactctcagtacaatctc 

-cac<xgccaacacccgctgacgcgccctgacgggcttgtctgctccgggcatccgcitacaga 

CAAGCTGTGACCGTCTCCGGGAGCrGCATGTCTCAGAGGTTTTCACCGTCATCACCGAAACGC 
GCGAGACGAAAGGGCCTCGTGATACGCCTATITTTATAG<nTAATGTCATGATAATAATGGri^ 
TCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCT 

AAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCITCAATAATATT 
GAAAAAGGAAGAGTATGAGTATTCAACATTTCCGT^ 

" TTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATC^ 
TGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTnTC 
GCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCG 
CCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGG 
TTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGC 
AGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGG 
ACCGAAGGAGCTAACCGCTTTTTTGCACAACATGKJGGG^ATCATGTAACTCGCCTTGATCGTTG 
GGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAA 
TGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAAT 
TAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACITCTGCGCTCGGCCCrTCCGGCT 
GGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCA 
CTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAAC 
TATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAAC 
TGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTm 
GATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTm 
CCACTGAGCGTCAGACCCCGTAGAAAAGATCAA^ 

CGTAATCTGCTGCITGCAAACAAAAAAACCACCGCrACCAGCGGTGGTCTGrn^GCCGGATCA 

AGAGCTACCAACrCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGT 

CCTTCTAGTGTAGCCGTAG'rTAGGCCACCACTTCAAGAACTCTG'rAGCACCGCCTACATACCT 

CGCTCTGCTAATCCrG'rTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCrTACCGGGlT 

GGACTCAAGACGATAGITACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCA 

CACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGA 

GAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCG 

GAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTrTATAGTCCTGTC 

GGGTTTCGCCACCTCTGACTrGAGCGTCGAlTTTTGTGATGCTCGTCAGGGGGGGGGAGCCrA 

TGGAAAAACGCCAGCAACGCGGCCTTTTTA(XXHTC^ r 

ATGGCTCGAC 



<MTCTTCAATATTXKKXJATTAGKX 

ATTGGCCATTGCATACGITGTATCTATATCATAA^^ 

ATGACCGCCATGI^XjGCATTGATTATTGACTAGIT^ 

AGTTCATAGCCCATATATGGAGTIXXXJCGTTACATAACTTACGGTAA 

ACCGCCCAACGACCCCCGC<X^7TGACGTCAATAATC^ 

AOraGACTTTCCATTGACGTCAATGGOT 

TCAAGTGTATCATATGCCAAGTCCGCCCCCTATTGACGTCAATGACGGTAAATGGCGCGCCTG 

GGATTATGCGCAGTACATGACCTrrACGGGACT 

CATCX3CTATTACCATGGTCM 

CTCACXjGGGATrTCCAAGTCTCCACCGCATreA 

TCAA(XjGGACTTTCCAAAATGTCGTAACAACTCC^ 

CXKH'AGGCGTGTACXKSarGGGAGGTCTAT^^ 

CTGAATTCTGACGACCTACTGATTAACGGCCATAGAGGCCTCCTG 

ATTGCGOTAGTTTATCACAGTTAAATTGCT 

AACTTAAGCTGCAGTCACTCTC^ 

cctatctggccagttagcagtcgaagaaag^ 
gcccgaagtggcgagcccgatcttccccatc^ 

gcacctgtggcxk^cggtgatgccggccacgatgcgtccggcgtagaggatccaca<kjac^ 

tgtggtcgccatgatcgcgtagtcgatagtggctccaagtagcgaagcgagcaggactgggc 

ggcggccaaagcggtcggacagtgctccgagaacgggtgcgcatagaaattgcatcaacgca 

tatagcgctagatccttgctagagtcgagatctgtcgagccatgtgagcaaaaggccagcaa 

aaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccgcccccctgac 

gagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagata 

ccaggcgtttccccctggaagctccctcgtgcgctctcct 

atacctgtccgcctttctcccttcgggaagcgtggcgctttct 

ctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacgaaccccccgttcagccc 

gaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgac^atcg 

ccactggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacaga 

GTTCTl^GAAGTGGTGGCCrAACTACGGCTACACTAGAAGGACAGTA'm^GGTATCrGCGCTCT 
GCTGAAGCCAGTTACCITCGGAAAAAGAGTTGGm 

CTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAA 

GAAGATCCTTTGATCTTTTCTACGGGGTCTGACG 

ATTTTGGTCATGAGATTATCAAAAAGGATCTTC^ 

gtaaggagaaaatarcgcatcaggaaattgtaagcgttaataattcagaagaactc^^ 

ggcgataccgtaaagcacgaggaagcggtcagcccattcgrcgccaagctcttcagcaatatcacggg^ 

cacacccagccggccacagtcgatgaatccagaaaagcggceatt 

gccgtcgggcatgctcgccttgagrctggcgaacagttcggctggcgcgagc^^ 

tccgagtacg£gctcgc*cgatgcgatgtttcgcttggtg^ 

gatactttctcggcaggagcaaggtgagatgacaggagatcctgc^ 

gcacagctgcgcaaggaacgcccgtcgtggccagcx^cgatagccgc^^ 

aaagaaccgggcgcccctgcgctgacagccggaacacggcggcatcagagra^ 

aagcggccggagaacctgcglgcaatccatcttgUcaalcatgc^^ 

ggcggcgagaaagccatccagtltactttgcagggrttgtcaaccttaccagatAAAAGTGCTCATCATIGGAAA^ 

TTcTGAGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGG 

CTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAA 

AGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACC 

ATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCG 

CCCCATGGCTGACTAAlTTT r ITriAllTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGC r rA 

TTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTITrGC 

GACACAACAGTCTCGAACTTAAGGCTAGAGCCACCATGATTGAACAAGATGGATTGCACGCA 
GGTTCTCCGGCCGerTGGGTGGAGAGGCrAlTCGGCTATGACTGGGCACAACAGACAATCGG 
CTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGlTCT7 , 'm , GTCAAGAC 
CGACCTGTCCGGTGCCCTGAATGAACTGCAGGACGAGGCAGGGCGGCTATCGTGGCTGGCCA 
CGACGGGCGTTCCTTGGGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTG'- 



P(^02^ is A- 



CTA1TCGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCT 

TCCATCATGGCTGATGCAATGCGGCGGCTGCATAC^^ 

CACCAAG<^AAACATC<3CATCGAGCGAGCAC^ 

GGATGATCTGGACGAAGAG<^TCAG<KKKJn^^ 

CGCGCATGCCCGACGGCXJAGGATCTCGTC^^ 

TGGTGGAAAATGGCCGCITITCTGGATTCAT^ 

ATCAG<?ACATAGGGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCTGAC 

CGCTTCCTCGTGCTTTACCK^^ 

TTGACGAGccaTTOgatggaggtagC^^ 

CTCATACITCTGCCTGCTGGG^ 

GCTGGTTCITTCCGCCTCAGAAGGT^ 

TTXXKXnTAAATTTTTCTTAAATCAGCT^ 

CCTTATAAATCAAAAGAATAGACCGAGATAGGGTlXjAGTGTtGTTCCA 
TCCACTATTAAAGAACGTGGACTCCAACCT 



GATCITC^TATTXKKX^TC^ 
ATTXjGCCA7"TGCATACGTTGTA1T^ATAT^ 
ATGACCGCCWTGTTGGCATTGATTATTXl^CT 
AGTTCATAGCCCATATATGGAGTTCC<XXHTA^ 
ACCGCCCAACGACXDCCCGCCCATraACXH^AAT^ 
AGGGACITIXXIATTGACGTCAATGGGTG^ 
TCAAGTGTATCATATGCCAAGTCCGCCCCCTATTGACGTC 
GCATTATGCXX1AGTACATGACCTTA(XKK^^ 
CATCGCTATrACCATGGTGAlXX:GGTTTTGGCACT 
CTCACGGGkj^TTTCCAAGTCT 
TCAACXjGGACTTTCCAAAATGTCCT 
CGGTAGGC<jTGTAC<jGTGGGAGGTCTATATAAGC^ 
GCITrATTGCGGTA<mTATCA^ 
' TCTCXxAAClTAAGCTGCAGTGACTCiX^AA 
AGAGGCCTATCTGGCCAGlTAGCAGTCGAAGAAAGAAGTTTAAGAGAG<XX5AAACAAGCGCl t 
CATGAGCCCGAAGTGGCGAGCCCGATCnXXXXATCGGTGATGl^CGGCGATATAGGCGCCAG 
CAACCGCACCTGTGGCGCXXK]^^ 
AOKKnxSTGGTCXKX^ATGATCGCGTAG^ 
TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCC^^ 

ACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTGTCGAGCCATGTGAGCAAAAGGCC 

AGGAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCC 

CCTGACGAGGATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATA 

AAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTl^CCGACCCTGCCGCr 

TACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCA 

AGGTATCTCAGTTCGGTGTAGGTCGTrCGGTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTT 

CAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGAC 

TTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGC 

TACAGAGTTCTTGAAGnTGGTGGCCTAACTACGGCTACACrAGAAGGACAGTATITGGTATCTG 

CGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAA 

CCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGA 

TCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGT 

TAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTatcggtgtgaaataccg 

racagatgcgtaaggagaaaataccgcatcaggaaattgtaagcgtto^ 

tcgggagcggcgafaccgtaaagcacgaggaagcggtcagccratt^^ 

cggtccgccacacccagccggccacagtcgatgaafc^ 



agatccttggcggcgagaaagccatccagtttacUtgcagggcttgt^ 

TCAATTcTGAGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGl'CCCC 

AGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTG 

GAAAGTCCCCAGGCrCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCA 

ACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCT 

CCGCCCCATGGCTGACTAATITTTTTTATriATGCAGAGGCCGAGGCCGGCTCGGCCTCrGAG 

ClA'rTCCAGAAGTAGTGAGGAGGCIT'ri'ITGGAGGCClAGGCrrrTGCAAAAAGCrrGATTCT 

TCTGACACAACAGTCTCGAACrrAAGGCTAGAGCCACCATGATTGAACAAGATGGA'n^GCAC 

GCAGGTl"CTCCGGCCGCTTGGG r rGGAGAGGCTATTCGGCrATGACTGGGCACAACAGACAAT 

CG<jCTGCTCTGATGCCGCCGTG1TCCGGCTGTCAGCGCAGGGGCGCCCGGITCTTTTTGTCAA 

GACCGACCTGTCCGGTGCCCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGG 

CCACGACGGGCGl'rCCITGCGCAGC'rGrGCTCGACGlTGTCACTGAAGCGGGAAGGGACTGG 

CTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGCTCCTGCCGAGAAA" 



GTATCC^TCATGGCnDGAT(X5AA^ 

GACCACCAAGCGAAAGAIYXJCATCXM^ 

TCAGGATCATCTGGACGAAGAGCATCAGGGGCT^^ 

A(XXXXXX1ATG<XXXM<^^ 

ATCATGGTGGAAAATGGCCGCTITIXITC^ 

CGCTATCAGGACATAGCGTTGGCTACCCCT 

TGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGC 
COTCITGACGAGccaTTCtgctggatggCTacAGCT 
ttttgcatacctaatcattatgctgaggatttggaaagggtgt^ 
atgggaggccatetcattgtagccctctgtgtgc^^ 
gatecattcctatgactgtagattttafxxigactgaagagc^ 
actggaaagaatgtcttgattgtggaagatataaltgacactggcaaaaca^ 
tcgcaagcttgctggtgaaaaggaccccacgaagtgttggat^^ 



TGCTGACTAATTGAGATGCATGCTTTGCATACTTCT 
ACACCCTAACrGACACACATTCCACAGCreGITCITrCCGCCTCAGAAGG'rACACAGGCGAAA 
TTGTAAGCGTTAATATTTTGTTAAAAT^ 

CCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGA 
GTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGG 
CGAAAAACCGTCTATCAGGGCGATGGCCCAC 
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CACCrAAAl 'FGrAAGCGTl'AAl ATlTl'GI'rAAAAll'CGCGlTAAA'r'riTTG'r 

TAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTAT 

AAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAA 

CAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAA 

CCGTCTATCAGGGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTT 

TTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGC 

CCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGA 

AGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCG 

GTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAG 

GGCGCGTCCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATC 

GGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTG 

CAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTA 

AAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTGGGT 

ACaattcaattcgtcgacctcgaaattctaccgggtag 

cagccccgctgggcacttggcgctacacaagtggcctctggc«tcgcacacattc«acatccaccggtaggcgccaacc 
ggctccgttctttggtgg(xccttcgcgc^cx>ttctactcctcccctagtcaggaagttcccccc^ 
tcgtgcaggacgtgacaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagc 
gggtaggcctttggggcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtcc 
gggggcgggctcaggggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacg 
cttcaa^gcgcacgtctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagca 
gctgaagcttaccatgaccgagtacaagcccacggtgcgcctcgc^cccgcgacgacgtcccc^gggccgtacgcac 
c^tcgccgccgcgttcgccgactaccc^gccacgcgcxacaccgtcgacccggaccgccacatcgagcgggtcaccga 
; gctgcaagaactcttcctcacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggc 
ggtctggaccacgccggagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcg 
gttcccggctggccgcgcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccgcgtggttcctt 
ggcccaccgtcgggcgtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggagg 
cggccgagcgcgccggggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggctt 
caccgtcaccgccgacgtcgaggtgcccgaaggacxgcgcacctggtgcatgacccgcaagcccggtgcctgacgcc 
cgccccacgacccgcagcgccxjgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatca 
aggttagcGATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGC 
ATAAATCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAAT 
ATGTACATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGA 
TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGC 
CCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGC 
TGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCC 
ATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTA 
CGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCG 
CCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAG 
TACATGACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTC 
ATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGA 
TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAAT 
GGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAAC 
AACTGCGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGG 
TGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGA 
AGCTTTATTGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGTGCT 
TCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCTtaattaaccaccgctac 
aggtgagtactcgGATCTGCTACCTTAAgagaggcctatctggccagttagcagtcgaagaaagaagtttaa 
GAGAGCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCC 
CCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCC- 



ggtgatgccggccacgatgcgtccggcgtagaggatccacagoacgggtg 

tggtcgccatgatcgcgtagtcgatagtggctccaagtagcgaagcgagc 

aggactgggcggcggccaaagcggtcggacagtgctccgagaacgggtgc 

gcatagaaattgcatcaacgcatatagcgctagatccttgctagagtcgag 

gccgccaccgcggtggagctccagcttttgttccctttagtgagggttaat 

ttcgagcttggcgtaatcatggtcatagctgtttcctgtgtgaaattgtta 

tccgctcacaattccacacaacatacgagccggaagcataaagtgtaaag 

cctggggtgcctaatgagtgagctaactcacattaattgcgttgcgctcac 

tgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgaatcg 

gccaacgcgcggggagaggcggtttgcgtattgggcgctcttccgcttcct 

cgctcactgactcgctgcgctcggtcgttcggctgcggcgagcggtatcag 

ctcactcaaaggcggtaatacggttatccacagaatcaggggataacgca 

ggaaagaacatgtgagcaaaaggccagcaaaaggccaggaaccgtaaaa 

aggccgcgttgctggcgtttttccataggctccgcccccctgacgagcatc 

acaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataa 

agataccaggcgtttccccctggaagctccctcgtgcgctctcctgttccg 

accctgccgcttaccggatacctgtccgcctttctcccttcgggaagcgtg 

gcgctttctcatagctcacgctgtaggtatctcagttcggtgtaggtcgtt 

cgctccaagctgggctgtgtgcacgaaccccccgttcagcccgaccgctgc 

gccttatccggtaactatcgtcttgagtccaacccggtaagacacgactta 

tcgccactggcagcagccactggtaacaggattagcagagcgaggtatgt 

aggcggtgctacagagttcttgaagtggtggcctaactacggctacactag 

'aaggacagtatttggtatctgcgctctgctgaagccagttaccttcggaaa 

'aagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtg 

gtttttttgtttgcaagcagcagattacgcgcagaaaaaaaggatctcaag 

AAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACT 

cacgttaagggattttggtcatgagattatcaaaaaggatcttcacctaga 

tccttttaaattaaaaatgaagttttaaatcaatctaaagtatatatgagt 

aaacttggtctgacagttaccaatgcttaatcagtgaggcacctatctcag 

cgatctgtctatttcgttcatccatagttgcctgactccccgtcgtgtagat 

aactacgatacgggagggcttaccatctggccccagtgctgcaatgatacc 

gcgagacccacgctcaccggctccagatttatcagcaataaaccagccagc 

cggaagggccgagcgcagaagtggtcctgcaactttatccgcctccatcca 

gtctattaattgttgccgggaagctagagtaagtagttcgccagttaatag 

tttgcgcaacgttgttgccattgctacaggcatcgtggtgtcacgctcgtc 

gtttggtatggcttcattcagctccggttcccaacgatcaaggcgagttac 

atgatcccccatgttgtgcaaaaaagcggttagctccttcggtcctccgat 

cgttgtcagaagtaagttggccgcagtgttatcactcatggttatggcagc 

actgcataattctcttactgtcatgccatccgtaagatgcttttctgtgact 

ggtgagtactcaaccaagtcattctgagaatagtgtatgcggcgaccgag 

ttgctcttgcccggcgtcaatacgggataataccgcgccacatagcagaac 

tttaaaagtgctcatcattggaaaacgttcttcggggcgaaaactctcaag 

gatcttaccgctgttgagatccagttcgatgtaacccactcgtgcacccaa 

ctgatcttcagcatcttttactttcaccagcgtttctgggtgagcaaaaac 

aggaaggcaaaatgccgcaaaaaagggaataagggcgacacggaaatgt 

tgaatactcatactcttcctttttcaatattattgaagcatttatcagggtt 

attgtctcatgagcggatacatatttgaatgtatttagaaaaataaacaaa 

taggggttccgcgcacatttccccgaaaagtgc 



fad*? ^ 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 
TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 
CATTTATATTGGCTCATGTCX^AATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 
ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 
TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 
ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 
tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 
ctagagtcgaccaattctc»tgtttgacagcttatcatcgcagatcx;tgagcttgtatgg^g(^ctctcagtacaatctgctct 
gctgccgcatagttaagccagtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagcta 
;caacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacggg 
ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 
ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 
cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 
gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 
agtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 
ctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 
ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 
agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 
attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 
cttactgaacaaccagaattagcaaataaagtagacatggtctggatagttggtggcagttctgtttataaggaagccatga 
atcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaagtgacacgttttttccagaaattgatttgg 
agaaatataaacttctgccagaatacccaggtgttctactgatgtccaggaggagaaaggcattaagtacaaatttgaagt 
atatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 
actgttgtaattcattaagcattctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatca 
gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatca 
aaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggtttt 
caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 
acgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccata 
cggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggt 
ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 
acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 
actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 
ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 
gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 
atgtcgtcttacaccattgagtcgtctcccctttggaatggcccctggacccggcccacaacctggcccgctaagggagtc 
cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 



agcccgctcctacctgcaatatcagggtgact^ 

gtggaaggggctgccgcggagggtgatgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaag 
ggcaggagtgatgtaacttgttaggagacgccctcaatc^ 
cagtagacatcatgcgtgctgttggjgtamctggccatctgtcttgtcaccattttcg^ 
catacxxatgttgtcacgtc&ctcagctccgcgctcaa 

aatcagacatgcgacggctttagcctggcctccttaaattcacctaagaatgggagcaaccagcatgcaggaaaaggaca 
agcagcgaaaattcacgcccccttgggaggtggcggcatatgcaaag 

gctgactgtatatgcatgaggategcatetgctacccggatacagattaggatagcatatactacccagatatagattaggat 
agcatatgctacccagatatagattaggatagcctatgctaccca^ 

ttaggatagcatatgctacxx^gatatagattoggatagcxtatgctacccagatatagatteggatagcatatgcte 
atatagattaggatagcatatgctatccagatatttgggtagtatatgcto^ 
aatctctattaggatagcatatgctacccggatacagattaggatagcato^ 
ctacccagatatagattaggatagcctatgctacccagatataaattaggata^ 

gcatatgctacccagatatagattaggatagcctatgctacccagatatagattaggatagcatatgctatccagatatttgg 

gtagtatatgctacccatggcaacattagcccaccgtgctctcagcgacctcgtgaatatgaggaccaacaac^ 

ggcgctcaggcgc^gtgtgtgtaatttgtcctccagatc^^ 

caggtattccccggggtgccattagtggttttgtgggcaagtggtttgaccgcagtggttagcggggttacaatcagccaa 
gttattacacccttattttacagtccaaaaccgc^gggcggcgtgtgggggctgacgcgtgcccccactccacaatttcaaa 
aaaaagagtggccacttgtctttgtttatgggccccattggcgtggagccccgtttaattttcgggggtgttagagacaacca 
gtggagtccgctgctgtcggcgtccactctctttcc^tt^ 

tgcctgggacacatcttaataaccccagtotcatattgcactaggattatgtgttgcccatagccataaattcgtgtgagatgg 

acata;agtctttacggcttgtc^ccaccccatggatttctattgttaaagatattcagaatgtttcattcctacacte 

gcccaaggggtttgtgagggttatattggtgtcatagcacaatgccaccactgaaccccccgtccaaattttattctggggg 

cgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaagg 

agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 

gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 

gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 

atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 

ggtccgcttatcggtagctacacaggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggt 

acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 

aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 

ccctttgtgtttggtcccccxccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 

atgcactgccccgaatacaaaacaaaagcgctcctcgtacxagcgaagaaggggcagagatgccgtagtcaggtttagtt 

cgtccggcggcggGCGGCCGCAAGK3CGCGCCGGATCCACAGOACGGGTGTGGTC 

GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGAC 

TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 

GAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTG 

TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACA 

AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 

TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 

CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 

CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 

CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 

TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 

CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 

GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 

GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 

AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGOATCTCAAGAA 

GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 

CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 

CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 

CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 

AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 

AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 

AACGCTATGTCCTGATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATG 

AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 

CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 

GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 

TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 

TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 

CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 

ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 

CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 

GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 

GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 

GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 

GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 

TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 

CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 

GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 

cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 

acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 

ggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccccgcanctcgcgtcgtgcaggacgtg 

acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggg 

gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 

gggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 

ctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 

ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccxxgggccgtacgcaccctcgccgccgcgttcg 

ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 

cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 

gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgc 

gcagcaacagatggaaggcctcctggcgccgca(Xgggcccaaggagca;gcgtggttccttggcccaccgtcgggc 

gtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccg 

gggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgac 

gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgca 

gcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 

TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 

GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 

TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 

GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 

GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 

GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 

CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



gatcttcaatattggccattagccatattattcattggttatatagcataaa 
tcaatattggctattg<k:cattgcatacgttgtatctatatcataatatgta 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 
ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 
TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 
ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 
tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 
ctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagtacaatctgctct 
gctgccgcatagttaagccagtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagcta 
; caacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacggg 
ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 
ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 
cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 
gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 
agtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 
ctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 
ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 
agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 
attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 
cttactgaacaaccagaattagcaaataaagtagacatggtctggatagttggtggcagttctgtttataaggaagccatga 
atcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaagtgacacgttttttccagaaattgatttgg 



atatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 

actgttgtaattcattaagcattctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatca 

gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatca 

aaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggtttt 

caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 

acgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccata 

cggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggt 

ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 

acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 

actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 

ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 

gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 

atgtcgtcttacaccattgagtcgtctcccctttggaatggcccctggacccggcccacaacctggcccgctaagggagtc 

cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 



agcccgctcctacctgcaatatragggtgactgt^ 

gtggaaggggctgccgcggagggtgatgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaag 
ggcaggagtgatgtaacttgttaggagacgc^tcaatcgtattaaaa^ 
cagtagacatcatgcgtgctgttggtgtamctggccatctg^^ 
cataccratgttgtcacgtcactcagctccgcgctc^ 

aatcagacatgcgacggctttagcctggcctccttaaattcacx;taagaatgggagcaaccagcatgcaggaaaaggaca 

agcagcgaa^ttcacgcccccttgggaggtggcggcatatgc^aaggatagcactcxxactctactactgggtatcatat 

gctgactgtatatgcatgaggatagcatatgctacccggatacagattaggatagcatatactacccagatatagattagg 

agcatatgctacccagatatagattaggatagcx>tatgctaccca^^ 

ttaggatag<^tatgctacccagatatagattaggatagcctatgctacccagatatagattagg 

atatagattaggatagcatatgctata^gatatttgggtagtatatgctacccagatataaattaggatag 

aatctctattaggatagcatatgctacccggatacagattaggatagca^ 

ctacccagatategattaggatagcctatgctacccagatataaattag^ 

gcatatgctacccagatatagattaggatagcctatgctacccagat^^ 

gtagtatatgctacccatggcaacattagcccaccgtgctrt 

ggcgctcaggcgcaagtgtgtgtaatttgtcctcraga^ 

caggtattccccggggtgccattagtggtmgtgggcaagtggtttgaccgcagtggttagcggggttac^ 
gttattacacccttattttacagtccaaaarcgc^^ 

aaaaagagtggccacttgtctttgtttatgggccccattggcgtggagccccgtttaattttcgggggtgttagagacaacca 

gtggagtccgctgctgtcggcgtccactctctttc^ccttgttacaaatagagtgtaacaacatggttcacctgtcttggtc^ 

tgcctgggacacatcttaataaccccagtatcatattgcactaggattatgtgttgcccatagccataaattcgtgtgagatgg 

acatccagtctttacggcttgtccccacc<xatggatttctattgttaaagatattcagaatgtttcattcctacactagtatttatt 

gcccaaggggtttgtgagggttatattggtgtcatagcacaatgccaccactgaaccccccgtccaaattttattctggggg 

cgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaagg 

agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 

gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 

gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 

atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 

ggtccgcttatcggtagctacacaggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggt 

acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 

aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 

ccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 

atgcactgccccgaatacaaaacaaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagtt 

cgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCCACAGGACGGGTGTGGTC 

GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGAC 

TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 

GAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTG 

TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACA 

AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 

TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 

CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 

CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 

CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 

TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 

CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 

GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 

GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 

AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 

GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 

CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 

CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 

CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 

AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 

AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 

AACGCTATGTCCTGATAGCGGTCCGGCACACCCAGCCGGCCACAGTCGATG 

AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 

CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 

GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 

TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 

TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 

CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 

ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 

CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 

GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 

GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 

GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 

GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 

TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 

CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 

■GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 

'cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 

acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 

ggccccttcgcgccaccttctactcctccc^tagtcaggaagttcaxxxcgccccgcanctcgcgtcgtgcaggacgtg 

acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggg 

gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 

gggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 

ctgccgcgctgttctcx;tcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 

ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcg 

ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 

cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 

gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggcx;gc 

gcagcaacagatggaaggcctcctggcgcxgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggc 

gtcttcgcccgaccaccagggcaagggtctggcaagcgwgtcgtgctccccggagtggaggcggccgagcgcgccg 

gggtgcccgccttcctggagacctccgcgccwgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgac 

gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgca 

gcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 

TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 

GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 

TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 

GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 

GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 

GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 

CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 
TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 
CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGK3GCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCAGATCTAAGCTAGCGCCGCCACCATGGGCC 
CTAAAAAGAAGCGTAAAGTCGCCCCCCCGACCGATGTCAGCCTGGGGGAC 
GAGCTCCACTTAGACGGCGAGGACGTGGCGATGGCGCATGCCGACGCGCT 
AGACGATTTCGATCTGGACATGTTGGGGGACGGGGATTCCCCGGGGCCGG 
GATTTACCCCCCACGACTCCGCCCCCTACGGCGCTCTGGATATGGCCGACT 
TCGAGTTTGAGCAGATGTTTACCGATGCCCTTGGAATTGACGAGTACGGTG 
GGGAATTCAGGTGAGTACTCGCTACCTTAAggcctatctggccgtttaaacagatgtgtataag 
; agacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttgctagagtcgaccaattctc 
atgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagtacaatctgctctgctgccgcatagttaagcc 
agtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgac 
cgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacgggccagatatacgcgtatctga 
ggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtcccctcaggatatagtagtttcgc 
ttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaacgatgagttagcaacatgcc 
ttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgtgccttattaggaaggcaaca 
gacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtatttaagtgcctagctcgatacaata 
aacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagcctcgagacgcgtgatttcctt 
cgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatcggcaagaacggggacctgc 
cctggccaccgctcaggaatgaattcagatetttccagagaatgaccacaacctcttoagtagaaggtaaacagaatctggt 
gattatgggtaagaagawtggttctccattcctgagaagaatcgacctttaaagggtagaattaatttagttctcagcagag 
aactc^ggaacctccacaaggagctcattttctttccagaagtctagatgatgc^ttaaaacttactgaacaaccagaatta 
gcaaataaagtagacatggtctggatagttggtggcagttctgtttataaggaagccatgaatcacccaggccatcttaaac 
tatttg^gacaaggatcatgcaagactttgaaagtgacacgttttttc^gaaattgatttggagaaatataaacttctgccag 
aatacccaggtgttctctctgatgtccaggaggagaaaggcattaagtacaaatttgaagtatatgagaagaatgTTAA 
TTAAgggcacc^ataactgcctteaaaaaattacgccccgccctgccactcatcgcagtactgttgtaattcattaagcat 
tctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtata 
atatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccag 
ggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggttttcaccgtaacacgccacatctt 
gcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaaacgtttcagtttgctcatggaa 
aacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccatacggaattccggatgagcattc 
atcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatcc 
agctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttctttacgatgccattgggatatatca 
acggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgataactcaaaaaatacgcccggtag 
tgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcgccaaaTTAATTAAGG 
CGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaacttgggtcgccggtgtgttcgtat- 



atggaggtagtaagacctc^tttacaacctaaggcga^ 

cgtctcccctttggaatggcccctggacccggcccacaac^ 

tttacaaactcatatamgctgaggttttgaaggatgcga 

agggtgactgtgtgcagctttgacgatggagtagatttgcctccctggmccacctatggtggaaggggctgccgcggag 

ggtgatgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaagggcaggagtgatgtaacttgtta 

ggagacgccctcaatcgtattaaaagrcgtgtattccc^ 

ggtgtatttctggccatctgtcttgtcaccatttt^^ 

agctccgcgctcaacaccttctcgcgttggaaaacattagcgaca^ 

cctggcctccttaaattcacctaagaatgggagcaaccagcatgcaggaaaaggacaagcagcgaaa^ttcacgccccct 
tgggaggtggcggcatatgcaaaggatagcactcccactctactactgggtatcatatgctgactgtatatgcatgaggata 
gratatgctacccggataragattaggatagcatatactacc^^ 

taggatagcctatgctacccagatataaatteggatagcatatactacccagatatagattaggatagcatatgcte 

tatagattaggatagcctatgctacccagatatagattaggatagcatatgctacccagatatagattaggatagca^ 

t<x^gatatttgggtagtatatgctacccagatataaattaggata^^ 

acccggata^gattaggatagcatatactaccc^gatatagattaggatagcatatgctacccagatatagattaggatag 

cctatgctacccagatat^attaggalagcatatactecccagatatagattaggatagcatatgctacccagatatagatta 

ggatagcctatgctacccagatategattaggatagcatatgctatccagatatttgggtagtatatgcta<xcatggcaaca 

ttagcccaccgtgctctcagcgacctcgtgaatatgaggaccaacaaccctgtgcttggcgctcaggcgcaagtgtgtgta 

atttgtcctccagatcgcagcaatcgcgcccctatcttggcccgcccacctacttatgcaggtattccccggggtgccatta 

gtggttttgtgggcaagtggtttgaccgcagtggttagcggggttacaatcagccaagttattacacccttattttacagtcc^ 

aaaccgcagggcggcgtgtgggggctgacgcgtgcccccact 

ttatgggccccattggcgtggagccccgtttaattttcgggggtgttagaga^ 

' ccactctcmcccxttgttacaaatagagtgtaacaacatggttcacctgtcttggtccctgcctgggacacatctta 
ccagtatcatattgcactaggattatgtgttgcccatagccataaattcgtgtgagatggacatccagtctttacggcttgtcc 
ccaccccatggatttctattgttaaagatattcagaatgtttcattcctacactagtatttattgcccaaggggtttgtgagggtt 
atattggtgtcatagcacaatgccaccactgaaccccccgtccaaattttattctgggggcgtcacctgaaaccttgttttcga 
gcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaaggagaatgaagaagcaggcgaag 
attcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatggttcactaccctcgtggaatcctg 
accccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttaggacccttttactaaccctaattcga 
tagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtatatactactacccgggaagcatatg 
ctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgagggtccgcttatcggtagctacaca 
ggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggtacatgtcccccagcattggtgtaa 
gagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgcaaagtctgctccaggatgaaagcc 
actcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaacccctttgtgtttggtccccccccgt 
gtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattacatgcactgccccgaatacaaaac 
aaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagttcgtccggcggcggGCGGC 
CGCAAGGCGCGCCGOATCCACAGOACGGGTGTGGTCGCCATGATCGCGTA 
GTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGACTGGGCGGCGGCCAA 
AGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATAGAAATTGCATCAAC 
GCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTGTCGAGCCATGTGAG 
CAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCG 
TTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCA 
AGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCC 
CCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGG 
ATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCA 
CGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGT 
GTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTAT 
CGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCC 
ACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGT- 



TCTTGAAGTGOTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTA 
TCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTT 
GATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGC 
AGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTT 
CTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTG 
GTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTATCGGTGTGA 
AATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGAAATTGTAAG 
CGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGATAGAAGGCGATGCGC 
TGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGGAAGCGGTCAGCCCA 
TTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCCAACGCTATGTCCTG 
ATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATGAATCCAGAAAAGC 
GGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCGCCATGGGTCACGA 
CGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTGGCGAACAGTTCGG 
CTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCCTGATCGACAAGAC 
CGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGTTTCGCTTGGTGGT 
CGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCGCCGCATTGCATCA 
GCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAGATGACAGGAGATC 
CTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTCCCGCTTCAGTGAC 
AACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTGGCCAGCCACGATA 
GCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCGGACAGGTCGGTCT 
TGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCGGAACACGGCGGCA 
TCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCCGAATAGCCTCTCC 
- ACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTGTTCAATCATGCGA 
AACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCCCCTGCGCCATCAG 
ATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCAGGGCTTGTCAACC 
TTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgtcgacctcgaaattctaccggg 
taggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggcacttggcgctacacaagtggc 
ctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggtggccccttcgcgccaccttcta 
ctcctcccctagtcaggaagttcccccccgccccgcanctcgcgtcgtgcaggacgtgacaaatggaaatagcacgtctc 
actagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggggcagcggccaatagcagcttt 
gctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcaggggcgggctcaggggcggg 
gcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgtctgccgcgctgttctcctcttc 
ctcatctccgggcctttcgacctgcatc>catctagatctcgagcagctgaagcttaccatgaccgagtacaagcccacggt 
gcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcgccgactaccccgccacgcg 
ccacac^gtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcctcacgcgcgtcgggctcgac 
atcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccggagagcgtcgaagcggggg 
cggtgttcgccgagatcggccGgcgcatggccgagttgagcggttcccggctggccgcgcagcaacagatggaaggcc 
tcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggcgtcttcgcccgaccaccaggg 
caagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccggggtgcccgccttcctggaga 
cctccgcgccccgcaacctccccttctacgagcggctcggcttcaccgtcaccgcxgacgtcgaggtgcccgaaggacc 
gcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgcagcgcccgaccgaaaggagcg 
cacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGCTAACCTGGTTGCT 
GACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGGG 
GACTTTCCACACCCTAACTGACACACATTCCACAGCTGGTTCTTTCCGCCTC 
AGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCG 
TTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGC 
AAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTT 
CCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAA 
GGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGOTTATATAGCATAAA 

TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 

ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 

ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 

ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 

ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 

ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 

ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 

GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 

TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 

CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 

GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 

ACGACCTACTGATTAACGGCCAGATCTAAGCTAGCTTCCTGAAAGATGAAG 

CTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTC 

AAGTGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTG 

GGAGTGTCGCTACTCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACA 

TCTGACAGAAGTGGAATCAAGGCTAGAAAGACTGGAACAGCTATTTCTACT 

GATTTTTCCTCGAGAAGACCTTGACATGATTTTGAAAATGGATTCTTTACA 

GGATATAAAAGCATTGTTAACAGGATTATTTGTACAAGATAATGTGAATAA 

AGATGCCGTCACAGATAGATTGGCTTCAGTGGAGACTGATATGCCTCTAAC 

ATTGAGACAGCATAGAATAAGTGCGACATCATCATCGGAAGAGAGTAGTA 

ACAAAGGTCAAAGACAGTTGACTGTATCGCCGGAATTCAGGTGAGTACTC 

GCTACCTTAAggcctatctggccgtttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTC 

TCTTATACACATCTagatccttgctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagct 

tgtatggtgcactctcagtacaatctgctctgctgccgcatagttaagccagtatctgctccctgcttgtgtgttggaggtcgc 

tgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttag 

gcgttttgcgctgcttcgcgatgtacgggccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcgg 

ggcttcggttgtacgcggttaggagtcccctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatg 

caatacacttgtagtcttgcaacatggtaacgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgcc 

gattggtggaagtaaggtggtacgatcgtgccttattaggaaggcaacagacaggtctgacatggattggacgaaccact 

gaattccgcattgcagagataattgtatttaagtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtg 

cacctccaagctgggtaccagctgctagcctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgc 

atcgtcgctgtgtcccagaacatgggcatcggcaagaacggggacctgccctggccaccgctcaggaatgaattcagata 

tttccagagaatgaccacaacctcttcagtagaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattc 

ctgagaagaatcgacctttaaagggtagaattaatttagttctcagcagagaactcaaggaacctcx^acaaggagctcatttt 

ctttccagaagtctagatgatgccttaaaacttactgaacaaccagaattagcaaataaagtagacatggtctggatagttgg 

tggcagttctgtttataaggaagccatgaatcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaa 

gtgacacgttttttccagaaattgatttggagaaatataaacttctgccagaatacccaggtgttctctctgatgtccaggagg 

agaaaggcattaagtacaaatttgaagtatatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaat 

tacgccccgccctgccactcatcgcagtactgttgtaattcattaagcattctgccgacatggaagccatcacagacggcat 

gatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaag 

aagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaat 

aaaccctttagggaaataggccaggttttcaccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcg 

tcgtggtattcactccagagcgatgaaaacgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatat 

caccagctcaccgtctttcattgccatacggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgg 

ataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagc- 



aactgactgaaatgc«tcaaaatgttctt^^ 

agcttccttagctcctgaaaatctcgataactca^ 

tcttacgtgccgatcaacgtctcattttcgcc^ 

tagaaaggactaccgacgaaggaacttgggtcgccggtgtgttcg^ 

ggcgaggaactgcccttgctattccacaatgtcgt^ 

crcacaacctggcccgctaagggagtcrattgtct^ 

gatgcgattaaggaccttgttatgacaaag(xcgctcctacctgc^tatcagggtgactgtgtgcagctttgacgatggag 

tagatttgwtCKXtggtttccacctatggtggaaggggctgccgcggagggtgatgacggagatgacggagatgaagg 

aggtgatggagatgagggjgaggaagggcaggagtgatgtaacttgttaggagacg^ 

tattcccccgractaaagaataaatccccagtagacat^ 

tcgtortcccaacatggggcaattgggcatacc^ 

aaacattagcgacatttacctggtgagcaatcagacatgcgacggctttagcctggcctccttaaattcacxitaagaatggg 
agcaaccagcatgcagga^aaggacaagcagcgaaaattcacgcccccttgggaggtggcggcatatgcaaaggatag 
cactcccactctactactgggtatcatatgctgactgtatatgcatgaggatagcatatgctacccggatacagattaggata 
gcatatactacccagatatagattaggatagcatatgctacc^ 

aggatagcatatactacccagatatagatteggategcatatgctacccagatatagattaggatagcctatgctacccaga^ 

atagatUggatagcatatgctacc^gatatagattaggatagcatatgctatccagatatttgggtagtatatgctacc^ 

atataaattaggatagcatatactaccx^taatctctattaggatagcatatgctacccggatacagattaggatagcatatact 

acccagatatagattaggatagcatatgctacccagatatagattaggatagcctatgctacccagatataaattaggatagc 

atatactacccagatatagattaggatagcatatgctacccagatatagattaggatagcctatgctacccagatatagatta 

ggatagcatatgctatccagatatttgggtagtatatgctacccatggcaacattagcccaccgtgctctcagcgacctcgtg 

aatatgaggaccaacaaccctgtgcttggcgctcaggcgcaagtgtgtgtaatttgtcctccagatcgcagcaatcgcgcc 

cctatcttggcccgcccacctacttatgcaggtattccccggggtgc^attagtggttttgtgggcaagtggtttgaccg 

tggttagcggggttacaatcagccaagttattacacccttattttacagtccaaaaccgcagggcggcgtgtgggggctga 

cgcgtgcccccactccacaatttcaaaaaaaagagtggccacttgtctttgtttatgggccccattggcgtggagccccgttt 

aattttcgggggtgttagagacaaccagtggagtccgctgctgtcggcgtccactctctttccccttgttacaaatagagtgt 

aacaacatggttcacctgtcttggtccctgcctgggacacatcttaataaccccagtatcatattgcactaggattatgtgttg 

cccatagccataaattcgtgtgagatggacatccagtctttacggcttgtccccaccccatggatttctattgttaaagatattc 

agaatgtttcattcctacactagtatttattgcccaaggggtttgtgagggttatattggtgtcatagcacaatgccaccactga 

accccccgtccaaattttattctgggggcgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactc 

agcagttattctattagctaaacgaaggagaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatc 

ttcagccactgcccttgtgactaaaatggttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcat 

ggggtgggagatatcgctgttccttaggacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgct 

attgaattagggttagtctggatagtatatactactacccgggaagcatatgctacccgtttagggttaacaagggggcctta 

taaacactattgctaatgccctcttgagggtccgcttetcggtagctacacaggcccctctgattgacgttggtgtagcct^ 

cgtagtcttcctgggcccctgggaggtacatgtcxx:ccagcattggtgtaagagcttcagccaagagttacacataaaggc 

aatgttgtgttgcagtccacagactgcaaagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccattta 

taaggatgtcaactacagtcagagaacccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggca 

agttgtaccaaccaactgaagggattacatgcactgccccgaatacaaaaraaaagcgctcctcgtaccagcgaagaagg 

ggcagagatgccgtagtcaggtttagttcgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCC 

ACAGGACGGGTGTGGTCGCCATGATCGCGTAGTCGATAGTGGCTCCAAGT 

AGCGAAGCGAGCAGGACTGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCC 

GAGAACGGGTGCGCATAGAAATTGCATCAACGCATATAGCGCTAGATCCT 

TGCTAGAGTCGAGATCTGTCGAGCCATGTGAGCAAAAGGCCAGCAAAAGG 

CCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCC 

CCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAAC 

CCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTG 

CGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCC 

CTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGT- 





TCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTT 
CAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCG 
GTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAG 
CAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTA 
ACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGC 
CAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCA 
CCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAA 
AAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTC 
AGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAA 
AGGATCTTCACCTAGATCCTTTTATCGGTGTGAAATACCGCACAGATGCGT 
AAGGAGAAAATACCGCATCAGGAAATTGTAAGCGTTAATAATTCAGAAGA 
ACTCGTCAAGAAGGCGATAGAAGGCGATGCGCTGCGAATCGGGAGCGGCG 
ATACCGTAAAGCACGAGGAAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCA 
GCAATATCACGGGTAGCCAACGCTATGTCCTGATAGCGGTCCGCCACACCC 
AGCCGGCCACAGTCGATGAATCCAGAAAAGCGGCCATTTTCCACCATGATA 
TTCGGCAAGCAGGCATCGCCATGGGTCACGACGAGATCCTCGCCGTCGGG 
CATGCTCGCCTTGAGCCTGGCGAACAGTTCGGCTGGCGCGAGCCCCTGATG 
CTCTTCGTCCAGATCATCCTGATCGACAAGACCGGCTTCCATCCGAGTACG 
TGCTCGCTCGATGCGATGTTTCGCTTGGTGGTCGAATGGGCAGGTAGCCGG 
ATCAAGCGTATGCAGCCGCCGCATTGCATCAGCCATGATGGATACTTTCTC 
GGCAGGAGCAAGGTGAGATGACAGGAGATCCTGCCCCGGCACTTCGCCCA 
ATAGCAGCGAGTCCCTTCCCGCTTCAGTGACAACGTCGAGCACAGCTGCGC 
- AAGGAACGCCCGTCGTGGCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCA 
' GTTCATTCAGGGCACCGGACAGGTCGGTCTTGACAAAAAGAACCGGGCGC 
CCCTGCGCTGACAGCCGGAACACGGCGGCATCAGAGCAGCCGATTGTCTG 
TTGTGCCCAGTCATAGCCGAATAGCCTCTCCACCCAAGCGGCCGGAGAACC 
TGCGTGCAATCCATCTTGTTCAATCATGCGAAACGATCCTCATCCTGTCTCT 
TGATCAGAGCTTGATCCCCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCC 
ATCCAGTTTACTTTGCAGGGCTTGTCAACCTTACCAGATAAAAGTGCTCAT 
CATTGGAAAAcattcaattcgtcgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctgga 
gcatgcgctttagcagccccgctgggcacttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggt 
aggcgccaaccggctccgttctttggtggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccc 
cgcanctcgcgtcgtgcaggacgtgacaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctga 
gcaatggagcgggtaggcctttggggcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaag 
gggtgggtccgggggcgggctcaggggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccgg 
cattctgcacgcttcaaaagcgcacgtctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctag 
atctcgagcagctgaagcttaccatgaccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggc 
cgtacgcaccctcgccgccgcgttcgccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcg 
ggtcaccgagctgcaagaactcttcctcacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgc 
cgcggtggcggtctggaccacgccggagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggcc 
gagttgagcggttcccggctggccgcgcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccg 
cgtggttccttggcccaccgtcgggcgtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccg 
gagtggaggcggccgagcgcgccggggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagc 
ggctcggcttcaccgtcaccgccgacgtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtg 
cctgacgcccgccccacgacccgcagcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcagg 
taagtatcaaggttagcGGCCGCTAACCTGGTTGCTGACTAATTGAGATGCATGCTTT 
GCATACTTCTGCCTGCTGGGGAGCCTGGGGACTTTCCACACCCTAACTGAC 
ACACATTCCACAGCTGGTTCTTTCCGCCTCAGAAGGTACACAGGCGAAATT 
GTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGC- 



TCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAA 
GAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCC 
ACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATC 
AGGGCGATGGCCCAC 



tcaacgacaggagcacgatcatgcgcacccgtggcra 

agatggcggacgcgatggatatgttctgccaagggttgg^ 

cttggagtggtgaatccgttagcgaggtgccgrcggcttcca^ 

caacgcggggaggcagacaaggtatagggcggcgcx^caatccatgccaacccgttccatgtgctcg 
ataaatcgccgtgacgatcagcggt<xagtgatcgaagtta^ 

gatggtcgtcatctacctgcctggacagcatggcctgc^cgcgggc^tcccgatgccgccggaagcgagaagaatcat 
aatggggaaggccatccagcctcgcgtcgcgaacgccagcaagacgtagcc^^ 

taatggcctgcttctcgccgaaacgtttggtggcgggaccagtgacgaaggcttgagcgagggcgtgcaagattccgaat 
accgcaagcgacaggccgatcatagtcgcgctccagcgaaagc^ 
acctgtcctacgagttgcatgataaagaagacagtcataa^ 
agctgactgggttgaaggctctcaagggcatcggtcgacgrt^ 

gtaggttgaggccgttgagcac^gccgccgcaaggaatggtgcatgc^ggagatggcgcccaacagtcxx^ 
cggggcctgcx^ccatacccacgccgaaacaagcgctcatgagcrc^ 

gtcggcgatataggcgccagcaaccgcacctgtggcgccggtgatgccggccacgatgcgtccggcgtagaggatcca 
caggacgggtgtggtcgcx^atgatcgcgtagtcgatagt^^^ 

^gcggtoggacagtgctccgagaacgggtgcgcatagaaattgcatcaacgcatatagcgctagcagcacgccatag 
tgactggcgatgctgtcggaatggacgatatcccgcaagaggcccggc^^ 

catccagggtgacggtgc^gaggatgacgatgagcgcattgttagatttcatacacggtgcctgactgcgttagcaatttaa 
ctgtgataaactaccgcattaaagcttatcgatttccacacattatacgagccgatgttaattgtcaacagctcatgcatgacg 
tcccgggagcagacaagcccgtcagggcgcgtcagcgggtgttggcgggtgtcggggctggcttaactatgcggcatc 
agagcagattgtactgagagtgcaccatatgcggtgtgaaataccgcacagatgcgtaaggagaaaataccgcatcaggc 
gccattcgccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctattacgccagctggcgaaa 
* gggggatgtgctgcaaggcgattaagttgggtaacgccagggttttcccagtcacgacgttgtaaaacgacggccagtga 
' attcGAGCTCaTACTTCGAATAGGGATAACAG<KjTAATGCGATagcggccgcaatCG 
CTCTCTTAAGGTAGCccgtgcTGGCAAACAGCTATTATGGGTATTATGGGTGG 
GCCCTAGAAAGCTTggcgtaatcatggtcatagctgtttcctgtgtgaaattgttatccgctcacaattccacac 
aacatacgagccggaagcataaagtgtaaagcctggggtgcctaatgagtgagctaactcacattaattgcgttgcgctca 
ctgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgacccgcgaggtcgccgccccgtaaccccctacc 
gctgaaagttctgcaaagcctgatgggacataagtccatcagttcaacggaagtctacacgaaggtttttgcgctggatgtg 
gctgcccggcaccgggtgcagtttgcgatgccggagtctgatgcggttgcgatgctgaaacaattatcctgagaataaatg 
ccttggcctttatatggaaatgtggaactgagtggatatgctgtttttgtctgttaaacagagaagctggctgttatccactga 
gaagcgaacgaaacagtcgggaaaatctcccattatcgtagagatccgcattattaatctcaggagcctgtgtagcgtttat 
aggaagtagtgttctgtcatgatgcctgcaagcggtaacgaaaacgatttgaatatgccttcaggaacaatagaaatcttcg 
tgcggtgttacgttgaagtggagcggattatgtcagcaatggacagaacaacctaatgaacacagaaccatgatgtggtct 
gtccttttacagccagtagtgctcgccgcagtcgagcgacagggcgaagccctcgagtgagcgaggaagcaccaggga 
acagcacttatatattctgcttacacacgatgcctgaaaaaacttcx^ 

attattttttttatagtttttagatcttcttttttagagcgccttgtaggcctttatccatgctggttctagagaaggtgttgtgacaa 
attgccctttcagtgtgacaaatcaccctcaaatgacagtcctgtctgtgacaaattgcccttaaco:tgtgacaaattgccct 
cagaagaagctgttttttcacaaagttatccctgctta^^ 

atggatctgtcatggcggaaacagcggttatcaatcacaagaaacgtaaaaatagcccgcgaatcgtccagtcaaacgac 
ctcactgaggcggcatatagtctctcccgggatcaaaaacgtatgctgtatctgttcgttgaccagatcagaaaatctgatg 
gcaccctacaggaacatgacggtatctgcgagatccatgttgctaaatatgctgaaatattcggattgacctctgcggaagc 
cagtaaggatatacggcaggcattgaagagtttcgcggggaaggaagtggttttttatcgccctgaagaggatgccggcg 
atgaaaaaggctatgaatcttttccttggtttatcaaacgtgcgcacagtccatccagagggctttacagtgtacatatcaacc 
catatctcattcccttctttatcgggttacagaaccggtttacgcagtttcggcttagtgaaacaaaagaaatcaccaatccgt 
atgccatgcgtttatacgaatccctgtgtcagtatcgtaagccggatggctcaggcatcgtctctctgaaaatcgactggatc 
atagagcgttaccagctgcctcaaagttaccagcgtatgcctgacttccgccgccgcttcctgcaggtctgtgttaatgaga 
tcaacagcagaactccaatgcgcctctcatacattgagaaaaagaaaggccgccagacgactcatatcgtattttccttccg 
cgatatcacttccatgacgacaggatagtctgagggttatctgtcacagatttgagggtggttcgtcacatttgttctgacct- 



ax*gagggtaatttgtcacagtmgctgm 
caaatttgagggcagtttgtcacagttgara 
tgatgagggttgattatcaragtttattactctgaattggctat^ 
atttcttcttgcgctgagcgtaagagctatctgacagaacagttctte^ 

cacggctgcggcgagcgctagtgataataagtgactgaggtatgtgctcttcttatctccttttgtagtgttgctc^ 
caactttgcggttttttgatgactttgcgamtgttgttgctttg^ 

attaaaggatgttcagaatgaaactcatggaaacacttaaccagtgcataaacgctggtcatgaaatgacgaaggctatcg 

ccattgcacagtttaatgatgacagcccggaagcgaggaaaataacccggcgctggagaataggtgaagcagcgga 

agttggggtttcttctcaggctatcagagatgccgagaaagcagggcgactaccgcacccggatatggaaattcgaggac 

gggttgagcaacgtgttggttatacaattgaacaaatta^^ 

gacgtatttccaccggtgatcggggttgctgcccata^ 

ctggctctgaaggggctacgtgttttgctcgtggaaggtaacgacccccagggaacagcctcaatgte 

accagatcttcatattcatgcagaagaractctrctgrc^^^ 

ccacttgctggccggggcttgarattattccttcctgt^ 

aaggtaaactgc^ccgatcracacctgatgctccgactggrc^^ 

gacagcgcgrctocctgggtatcggcacgattaatgtc^ 

gtttgactaracctrcgcactgcagtttttcgatat^^^ 

gtacgtattttgcttaccaaatacagcaatagtaatggctctcagtccccgtggatggaggagcaaattcgggatgcctggg 
gaagcatggttctaaaaaatgttgtacgtgaaacggatgaagttggtaaaggtcagatccggatgagaactgtttttgaaca 
ggccattgatcaacgctcttcaactggtgcctggagaaatgctctttctetttgggaacctgtctgcaatgaaattttcgatcgt 
ctgattaaaccacgctgggagattagataatgaagcgtgcgcctgttattccaaaacatacgctcaatactc^accggttga 
agatacttcgttatcgacaccagctgccccgatggtggattcgttaattgcgcgcglaggagtaatggctcgcggtaatgcc 
■attactttgcctgtatgtggtcgggatgtgaagtttacto^ 

'atggtcaggtaatgaacgtgaccaggagctgcttactgaggacgcactggatgatctcatcccttcttttctactgactggtc 
aacagacaccggcgttcggtcgaagagtatctggtgtcatagaaattgccgatgggagtcgccgtcgtaaagctgctgca 
cttaccgaaagtgattatcgtgttctggttggcgagctggatgatgagcagatggctgcattatccagattgggtaacgatta 
tcgcccaacaagtgcttatgaacgtggtcagcgttatgcaagccgattgcagaatgaatttgctggaaatatttctgcgctgg 
ctgatgcggaaaatatttcacgtaagattattacccgctgtatcaacaccgccaaattgcctaaatcagttgttgctcttttttct 
caccccggtgaactatctgcccggtcaggtgatgcacttcaaaaagcctttacagataaagaggaattacttaagcagcag 
gcatctaaccttcatgagcagaaaaaagctggggtgatatttgaagctgaagaagttatcactcttttaacttctgtgcttaaa 
acgtcatctgcatcaagaactagtttaagctcacgacatcagtttgctcctggagcgacagtattgtataagggcgataaaat 
ggtgcttaacctggacaggtctcgtgttccaactgagtgtatagagaaaattgaggccattcttaaggaacttgaaaagcca 
gcaccctgatgcgaccacgttttagtctacgtttatctgtctttacttaatgtcctttgttacaggccagaaagcataactggcc 
tgaatattctctctgggccagaagcttggcccactgttccacttgtatcgtcggtctgataatcagactgggaccacggtccc 
actcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgattattagtctgggaccacggt 
cccactcgtatcgtcggtctgataatcagactgggaccacggtcccactcgtatcgtcggtctgattattagtctgggaccat 
ggtcccactcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgattattagtctggaacc 
acggtcccactcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgattattagtctggg 
accacgatcccactcgtgttgtcggtctgattatcggtctgggaccacggtcccacttgtattgtcgatcagactatcagcgt 
gagactacgattccatcaatgcctgtcaagggcaagtattgacatgtcgtcgtaacctgtagaacggagtaacctcggtgtg 
cggttgtatgcctgctgtggattgctgctgtgtcctgcttatccacaacattttgcgcacggttatgtggacaaaatacctgC 
GCTAGAgaaaagagtttgtagaaacgcaaaaaggccatccgtcaggatggccttctgcttaatttgatgcctggcagt 
ttatggcgggcgtcctgcccgccaccctccgggccgttgcttcgcaacgttcaaatccgctcccggcggatttgtcctactc 
aggagagcgttcaccgacaaacaacagataaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatgcctgg 
cagttccctactctcgcatggggagaccccacactaccatcggcgctacggcgtttcacttctgagttcggcatggggtca 
ggtgggaccaccgcgctactgccgccaggcaaattctgttttatcagaccgcttctgcgttctgggccgc 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGOTTATATAGCATAAA 
TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 
CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTcgtttagtgaaccgtcagatcactgaattctgacgacctactgattaacggc 
catagaggcctcctgcagaactgtcttagtgacaactatCGATTTCCACACATTATACGAGCCGAT 
GTTAATTGTCAACAGCTCATGCATGACGTCCCGGGAGCAGACAAGCCCGacc 
atggctcgagTAATACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTT 
AAGAGAGGCCTATCTGGCCAGTTAGCAGTCGAAGAAAGAAGTTTAAGAGA 
GCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCAT 
CGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTG 
ATGCCGGCCACGATGCGTCCGGCGTAGAGGATCCACAGGACGGGTGTGGT 
* CGCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGA 
' CTGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCAT 
AGAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCT 
GTCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAG 
GCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCAC 
AAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAG 
ATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGAC 
CCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC 
GCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCG 
CTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGC 
CTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATC 
GCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAG 
GCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAA 
GGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAA 
GAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT 
TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 
GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 
CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 
CTTTTatcggtgtgaaataccgcacagatgcgtaaggagaaaataccgcatcaggaaattgtaagcgttaataattcag 
aagaactcgtcaagaaggcgatagaaggcgatgcgctgcgaatcgggagcggcgataccgtaaagcacgaggaagcg 
gtcagcccattcgccgccaagctcttcagcaatatcacgggtagccaacgctatgtcctgatagcggtccgccacacccag 
ccggccacagtcgatgaatccagaaaagcggccattttccaccatgatattcggcaagcaggcatcgccatgggtcacga 
cgagatcctcgccgtcgggcatgctcgccttgagcctggcgaacagttcggctggcgcgagcccctgatgctcttcgtcc 
agatcatcctgatcgacaagaccggcttccatccgagtacgtgctcgctcgatgcgatgtttcgcttggtggtcgaatgggc 
aggtagccggatcaagcgtatgcagccgccgcattgcatcagccatgatggatactttctcggcaggagcaaggtgagat 
gacaggagatcctgccccggcacttcgcccaatagcagccagtcccttcccgcttcagtgacaacgtcgagcacagctgc 
gcaaggaacgcccgtcgtggccagccacgatagccgcgctgcctcgtcttgcagttcattcagggcaccggacaggtc- 



ggtcttgacaaaaagaaccgggcgcccctgcgctgacagccggaacacggcggcatcagagcagccgattgtctgttgt 

gaxagtcatagccgaatagcctctcracccaagcggccgga 

gat<xtfratcctgtctcttgatcagagcttgatc^ 

gcagggcttgtcaaccttaccagatAAAAGTGCTCATCATTGOAAAACGTTCAATTcTGAG 
GCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCC 
CCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCA 
GCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCA 
AAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCC 
CATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTG 
ACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCT 
ATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAA 
GCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCACCATG 
ATTGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAG 
GCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGC 
CGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGA 
CCTGTCGGGTGCCCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGT 
GGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTG 
AAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTC 
CTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCA 
ATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCACCAA 
GCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGT 
CGATCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAAC 
- TGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTG 
' ACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTT 
TCTGGATTCATCGACTGTGGCCGGCTGGGTGTGGCGGACCGCTATCAGGAC 
ATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCT 
GACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATC 
GCCTTCTATCGCCTTCTTGACGAGccaTTCtgctggcaggtaagtcgcagccctggcgtcgtgatt 
agtgatgatgaaccaggttatgaccttgatttattttgcatacctaatcattatgctgaggatttggaaagggtgtttattcctca 
tggactaattatggacaggactgaacgtcttgctcgagatgtgatgaaggagatgggaggccatcacattgtagccctctg 
tgtgctcaaggggggctataaattctttgctgacctgctggattacatcaaagcactgaatagaaatagtgatagatccattc 
ctatgactgtagattttatcagactgaagagctattgtaatgaccagtcaacaggggacataaaagtaattggtggagatgat 
ctctcaactttaactggaaagaatgtcttgattgtggaagatataattgacactggcaaaacaatgcagactttgctttccttg 
gtcaggcagtataatccaaagatggtcaaggtcgcaagcttgctggtgaaaaggaccccacgaagtgttggatataagcc 
agactttgttggatttgaaattccagacaagtttgttgtaggatatgcccttgactataatgaatacttcagggatttgaatcat 
gtttgtgtcattagtgaaactggaaaagcaaaatacaaagcctaaGCGGCCGCTAACCTGGTTGCTGA 
CTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGGGGA 
CTTTCCACACCCTAACTGACACACATTCCACAGCTGGTTCTTTCCGCCTCAG 
AAGGTACACAGGCGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTT 
AAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAA 
AATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCC 
AGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAG 
GGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



Piiro 



EBKA-1 



OriP 



gatcttcaatattg<k:cattagccatattattcattggttatatagcataaa. 

TCAATATTGGCTATTG<jCCATTGCATACGTTGTATCTATATCATAATATGTA 

catttatattggctcatgtccaatatgaccgccatgttggcattgattattg 

actagttattaatagtaatcaattacggggtcattagttcatagcccatat 

atggagttccgcgttacataacttacggtaaatggcccgcctggctgaccg 

cccaacgacccccgcccattgacgtcaataatgacgtatgttcccatagta 

acgccaatagggactttccattgacgtcaatgggtggagtatttacggtaa 

actgcccacttggcagtacatcaagtgtatcatatgccaagtccgccccct 

attgacgtcaatgacggtaaatggcccgcctggcattatgcccagtacatg 

ac^ttacgggactttcctacttggcagtacatctacgtattagtcatcgct 

attaccatggtgatgcggttttggcagtacaccaatgggcgtggatagcg 

gtttgactcacggggatttccaagtctccaccccattgacgtcaatgggag 

TTTGTTTTGGCACCAAAATCAAGGGGACTTTCCAAAATGTCGTAACAACTG 

CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 

GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 

ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 

ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 

TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 

ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 

tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 

ctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagtacaatctgctct 

gctgccgcatagttaagccagtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagcta 

.caacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacggg 

'ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 

ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 

cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 

gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 

agtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 

ctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 

ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 

agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 

attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 

cttactgaacaaccagaattagcaaataaagtagacatggtctggatagttggtggcagttctgtttataaggaagccatga 

atcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaagtgacacgttttttccagaaattgatttgg 

agaaatataaacttctgccagaatacccaggtgttctctctgatgtcx;aggaggagaaaggcattaagtacaaatttgaagt 

atatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 

actgttgtaattcattaagcattctgccgacatggaagcx:atcacagacggcatgatgaacctgaatcgccagcggcatca 

gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatca 

aaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggtttt 

caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 

acgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccata 

cggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggt 

ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 

acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 

actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 

ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 

gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 

atgtcgtcttacaccattgagtcgtctcccctttggaatggcccctggacccggcccacaacctggcccgctaagggagtc 

cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 




agccxgctcctarctgcaatatcagggt^ 



ggcaggagtgatgtaacttgttaggagacgcratcaatcgtattaaaa^ 
cagtagacatcatgcgtgctgttggtgtatttctggcrat^^^ 

catacccatgttgtcacgtcactcagctccgcgctcaacaccttctcgcgttggaaaacattagcgacatttacctggtgagc 



agcagcgaaaattc&cgaxccttgggaggtggcggcatatgca^ 

ggat 



gcatatg* 



ggcgctcaggcgcaagtgtgtgtaatttgtcctccaga 
caggtattccrcggggtgccattagtggttttgtgggc^gtg^ 

gttatUcacccttattttacagtccaaaac^gcagggcggcgtgtgggggctgacgcgtgcccccactccac^ 
aaaaagagtggccacttgtctttgtttatgggccccattgg 

gtggagtccgctgctgtcggcgtccactctctttccccttgttacaaatagagtgtaacaacatggttcacctgtctt^ 



acatccagtctttacggcttgtccccaccccatggattto^^ 
-gcccaaggggtttgtgagggttatattggtgtcatagcac^ 

' cgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaagg 
agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 
gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 
gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 
atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 
ggtccgcttatcggtagctacacaggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggt 
acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 
aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 
ccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 
atgcactgccccgaatacaaaacaaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagtt 
cgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCCACAGGACGGOTGTGGTC 
GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGAG 
TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 
GAAATTGCATCAACGCATATAGCGGTAGATCCTTGCTAGAGTCGAGATCTG 
TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 
CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCGCCTGACGAGCATCACA 
AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 
TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 
CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 
CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 
CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 
TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 
CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 
GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 
GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 
AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGOATCTCAAGAA 

GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 

CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 

CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 

CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 

AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 

AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 

AACGCTATGTCCTGATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATG 

AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 

CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 

GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 

TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 

TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 

CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 

ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 

CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 

GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 

GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 

GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 

GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 

TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 

CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 

GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 

cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 

acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 

ggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccccgcanctcgcgtcgtgcaggacgtg 

acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggg 

gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 

gggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 

ctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 

ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcg 

ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 

cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 

gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgc 

gcagcaacagatggaaggcxtcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggc 

gtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccg 

gggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgac 

gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgca 

gcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 

TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 

GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 

TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 

GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 

GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 

GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 

CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



Declaration for Patent Application 



Docket Number: 1522.0030004MAr7Rin 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am me original, first and sole inventor (if only one name is listed below) or an original, first and joint inventor (if 
plural names are listed below) of the subject matter that is claimed and for which a patent is sought on the invention entitled: 
Compositions and Methods for Non-targeted Activation of Endogenous Genes, the specification of which is attached hereto 
unless the following box is checked: 



H was filed on March 26. 1999: 

as United States Application Number or PCT International Application Number 09/276.820: and 
was amended on (if applicable). 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose information mat is material to patentability as defined in 37 C.F.R. § 1 .56. 

^hereby claim foreign priority benefits under 35 U.S.C. § 1 19(a)-(d) or § 365(b) of any foreign application^) for patent or 
TShventor's certificate, or § 365(a) of any PCT international application, which designated at least one country other than the 
ifJhited States listed below, and have also identified below any foreign application for patent or inventor's certificate, or PCT 
jhternatioiial application having a filing date before mat of the application on which priority is claimed. 

U3 

#rior Foreign Application^) Priority Claimed 

UJ □ Yes ONo 

M (Application No.) (Country) (Day/Month/Year Filed) 

rf O Yes □ No 



(Application No.) (Country) (Day/Month/Year Filed) 

3f hereby claim the benefit under 35 U.S.C. § 1 19(e) of any United States provisional application(s) listed below. 



(Application No.) (Filing Date) 



(Application No.) (Filing Date) 

I hereby claim the benefit under 35 U.S.C. § 120 of any United States applications), or under § 365(c) of any PCT 
international application designating the United States, listed below and, insofar as the subject matter of each of the claims 
of this application is not disclosed in the prior United States or PCT international application in the manner provided by the 
first paragraph of 35 U.S.C. § 1 12, 1 acknowledge the duty to disclose information that is material to patentability as defined 
in 37 C.F.R. § 1 .56 that became available between the filing date of the prior application and the national or PCT 
international filing date of this application. 



09/263.814 


March 8. 1999 




Pending 


(Application No.) 


(Filing Date) 


(Status - 


-patented, pending, abandoned) 


09/253.022 


February 19. 1999 




Pending 


(Application No.) 


(Filing Date) 


(Status - 


- patented, pending, abandoned) 


09/159.643 


September 24. 1998 




Abandoned 


(Application No.) 


(Filing Date) 


(Status - 


- patented, pending, abandoned) 


08/941.223 


September 26, 1997 




Abandoned 



(Application No.) (Filing Date) (Status - patented, pending, abandoned) 



Send Correspondence to: 



AppL No. 09/276,820 
Docket No. 1522.0030004/MAC/BJD 



STERNE, KESSLER, GOLDSTEIN & FOX P.L.L.C. 
1 100 New York Avenue, RW. 

Suite 600 
Washington, D.C. 20005-3934 

Direct Telephone Calls to: 

(202) 371-2600 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information 
and belief are believed to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under Section 1001 of Title 1 8 of the 
United States Code and that such willful false statements may jeopardize the validity of the application or any patent issued 
thereon. 



Full name of sole or first inventor John J. HARRINGTON 


.f Signature of sole or first inventor ^j^J^ \, 




I f o v( ftl . tJ/poLJorofk Hr. , kmhr gH ¥Y(£o 


" Citzenship (_) ^> ^\ 




: ™^ ^87 [faj^k (\r. s [hhpArA/{ w»£/» 




j"Z Full name of second inventon Bruce SHERF 




Signatureofsecondinventor ^ V*/?? ^ 


Residence Z $&3l (M 


, -Spacer, <Dtf , A^\C 


Citzenship fag^. 




Post Office Address 1^ £0. \ 






U6>$r1o 



Post Office Address 



(Supply similar information and signature for subsequent joint inventors, if any) 



Attorn^ Docket Ho. 5812=2 PATENT 

W THE UNITED STATES PATENT AND TRADEMARK OFFICE 

First Named Inventor: Harriogtoa et al. Group Art Unit- 1632 

Application 0^/276,820 Examiner Name: ShuKa,R. 

Filed: March 26, 1309 

For: COMPOSITIONS AND METHODS FOR. 
NON-TARGETED ACTIVATION OF 
ENDOGENOUS GENES 

Assistant Commissioner for Patents 
Washington, DC 2023 1 

REVOCATION OF POWER OF ATTORNEY 
AND NEW PQWBW OF ATTORNEY p .V ^ fifayfHEF 



Assignee hereby revokes all powers of attorney previously granted with respect to the 
above-identified patent application, and appoints the practitioners associated with the Customer 
Number provided below to prosecute this application and to transact all business in the Patent and 
Trademark Office connected therewith, and directs that ail correspondence be addressed to that 
Customer Number; 

Customer Number 000826 
with full power of substitution and revocation to transact all business in the Patent and Trademark 
OfRce in connection therewith. 

Please direct all communications to the attention of: 
Anne Srown 
Registration No- 36,463 
Tel Raleigh Office (919) 420-2200 
Fax Raleigh Office ($10) 420-2260 

Assignee hereby dects under 37 C.F.R. § 3.71 to prosecute this patent application and 

certifies that it is the assignee of the entire right, title, and interest in the patent application 

identified above, and in any divisional or continuations thereof, by virtue of: 

An assignment from the inventors of the patent application identified above. 
The assignment was recorded in the Patent and Trademark Office at Reel 010064 
Frama 0420. 



Tnre: Harrington et a!. 
Appl.No.: 09/276,820 
Filc4; Filed: March 26, 2999 
Eage2 



bdwlf JSfJSST 1 (Wh ° Se ^ be, ° W) 13 em ^ owerfid to *W ™s certificate a 



ATHERSYj 

By: . 



James J, Kovach 



{Print or Type ijojuc of person signing) 
Title: Chief Op flratm^ pffW 
Date: /V/ S 



ALSTON A BIRD LLP 

Post Office Drawer 34009 
Charlotte, NC 28234-4009 
Tel Tlaleigh Office (919) 420-2200 
Fax Raleigh Office (919) 420-2260 



CERTIFICATION OF TKAKSWISSIOiS 
3 h^by *eitUy «m 4j,i, pa^cr w Tiring Cicimik <i**miH*i 10 (h* Patea ™1 



CEUTIFICATK OF MAJUNC 

I ImnHiyfwtiiy lltftlhf* torr»pJnd<incs is being (Japodtod wtih ihe 
TJjlM Slims PqbkI Service wftj, wffldait j^ango as first c|afir mail 
in us envdopo o<ii?raw4 (q; AASistiwt Commissioriw Fcr $mtii 
Washinawn. DC 2023 J, J ^C gffl gg ^a^igjg . 



Attorney' s Docket No . 5 8 1 7-7L PATENT 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re: Harrington, et al. Group Art Unit: Not Yet Assigned 

Appl. No.: Not Yet Assigned Examiner: Not Yet Assigned 

Filed: Filed Concurrently Herewith 

For: COMPOSITIONS AND METHODS FOR NON-TARGETED ACTIVATION 

OF ENDOGENOUS GENES 

January 18, 2000 

REQUEST FOR TRANSFER OF COMPUTER READABLE FORM OF SEQUENCE 
LISTING UNDER 37 CFR §1.821(e) AND MPEP 2422.05 

Box Patent Application 

Assistant Commissioner for Patents 

Washington, DC 20231 

Sir: 

Applicants hereby request transfer of previously filed sequence information into the above- 
mentioned application, concurrently filed herewith. 

I hereby state that the paper copy of the sequence listing, attached hereto, is identical to 
the computer-readable copy of the sequence listing filed in U.S. Application Serial No. 
09/276,820, filed on March 26, 1999. In accordance with 37 CFR § 1.821(e) and MPEP 2422.05, 
please use the computer-readable form filed in that application as the computer-readable form for 
the above-mentioned application. It is understood that the Patent and Trademark Office will 
make the necessary change in application number and filing date for the present application. 

Respectfully submitted, 

Anne Brown 
Attorney for Applicant 
Registration No. 36,463 

ALSTON & BIRD LLP 

Post Office Drawer 34009 

Charlotte, NC 28234 

Tel Raleigh Office (919) 420-2200 

Fax Raleigh Office (919) 420-2260 

"Express Mail" Mailing Label Number EL247263380US 
Date of Deposit: January 18, 2000 

I hereby certify tihat this paper or fee is being deposited with the United States Postal Service "Express Mail Post Office to Addressee" service under 37 
addressed to Box Patent Application, Assistant Commissioner of Patents, Washington, DC 2023 1 . 




RTA01/2071690vl 



-1- 

SEQUENCE LISTING 



- <110> Harrington, John J. 
Sherf , Bruce 
Rundlett, Stephen 

<120> Compositions and Methods for Non-targeted Activation of Endogenous 
Genes 

<130> 1522. 003 0004/MAC/BJD 

<14 0> To be assigned 
<141> 1999-03-26 

<150> To be assigned 
<151> 1999-03-08 

<150> 09/253,022 
<151> 1999-02-19 

<150> 09/159,643 
<151> 1998-09-24 

<150> 08/941,223 
<151> 1997-09-26 

<160> 17 

<170> Patentln Ver. 2.0 

<210> 1 

<211> 39 

<212> DNA 

<213> Homo sapiens 



<400> 1 

tccttcgaag cttgtcatgg ttggttcgct aaactgcat 



39 
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<210> 2 

<211> 40 

<212> DNA 

<213> Homo sapiens 

<400> 2 

aaacttaaga tcgattaatc attcttctca tatacttcaa 

<210> 3 

<211> 28 

<212> DNA 

<213> Homo sapiens 

<400> 3 

atccaccatg gctacaggtg agtactcg 

<210> 4 

<211> 36 

<212> DNA 

<213> Homo sapiens 

<400> 4 

gatccgagta ctcacctgta gccatggtgg atttaa 

<210> 5 

<211> 33 

<212> DNA 

<213> Homo sapiens 

<400> 5 

ggcgagatct agcgctatat gcgttgatgc aat 

<210> 6 

<211> 51 

<212> DNA 

<213> Homo sapiens 
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<400> 6 

ggccagatct gctaccttaa gagagccgaa acaagcgctc atgagcccga a 



<210> 7 

<211> 6084 

<212> DNA 

<213> Homo sapiens 

<400> 7 



agatctztcaa tattggccat tagccatatt attcattggt tatatagcat aaatcaatat 60 
tggctattgg ccattgcata cgttgtatct atatcataat atgtacattt atattggctc 120 
atgtccaata tgaccgccat gttggcattg attattgact agttattaat agtaatcaat 180 
tacggggtca ttagttca.ta gcccatatat ggagttccgc gttacataac ttacggtaaa 240 
tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt 300 
tccC atagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta 360 
aact gcccac ttggcagtac atcaagtgta tcatatgcca agtccgcccc ctattgacgt 420 
caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttac gggactttcc 480 
tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca 540 
gtacaccaat gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat 600 
tgacgtcaat gggagtttgt . tttggcacca aaatcaacgg gactttccaa aatgtcgtaa 660 
caactgcgat cgcccgcccc gttgacgcaa atgggcggta ggcgtgtacg gtgggaggtc 720 
tatataagca gagctcgttt agtgaaccgt cagatcacta gaagctttat tgcggtagtt 780 
tatcacagtt aaattgctaa cgcagtcagt gcttctgaca caacagtctc gaacttaagc 840 
tgcagtgact ctcttaatta actccaccag tctcacttca gttccttttg cctccaccag 900 
tctcacttca gttccttttg catgaagagc tcagaatcaa aagaggaaac caacccctaa 960 
gatgagcttt ccatgtaaat ttgtagccag cttccttctg attttcaatg tttcttccaa 1020 
aggtgcagtc tccaaagaga ttacgaatgc cttggaaacc tggggtgcct tgggtcagga 1080 
catcaacttg gacattccta gttttcaaat gagtgatgat attgacgata taaaatggga 1140 
aaaaacttca gacaagaaaa agattgcaca attcagaaaa gagaaagaga ctttcaagga 1200 
aaaagataca tataagctat ttaaaaatgg aactctgaaa attaagcatc tgaagaccga 1260 
tgatcaggat atctacaagg tatcaatata tgatacaaaa ggaaaaaatg tgttggaaaa 1320 
aatatttgat ttgaagattc aagagagggt ctcaaaacca aagatctcct ggacttgtat 1380 
caacacaacc ctgacctgtg aggtaatgaa tggaactgac cccgaattaa acctgtatca 1440 
agatgggaaa catctaaaac tttctcagag ggtcatcaca cacaagtgga ccaccagcct 1500 
gagtgcaaaa ttcaagtgca cagcagggaa caaagtcagc aaggaatcca gtgtcgagcc 1560 
tgtcagctgt ccagagaaag ggatccaggt gagtagggcc cgatccttct agagtcgagc 1620 
tctcttaagg tagcaaggtt acaagacagg tttaaggaga ccaatagaaa ctgggcttgt 1680 



cgagacagag aagactcttg cgtttctgat aggcacctat tggtcttacg cggccgcgaa 1740 
ttccaagctt gagtattcta tcgtgtcacc taaataactt ggcgtaatca tggtcatatc 18 00 
tgtttcctgt gtgaaattgt tatccgctca caattccaca caacatacga gccggaagca 1860 
taaagtgtaa agcctggggt gcctaatgag tgagctaact cacattaatt gcgttgcgcg 192 0 
atgcttccat tttgtgaggg ttaatgcttc gagaagacat gataagatac attgatgagt 1980 
ttggacaaac cacaacaaga atgcagtgaa aaaaatgctt tatttgtgaa atttgtgatg 2040 
ctattgcttt atttgtaacc attataagct gcaataaaca agttaacaac aacaattgca 2100 
ttcattttat gtttcaggtt cagggggaga tgtgggaggt tttttaaagc aagtaaaacc 2160 
tctacaaatg tggtaaaatc cgataaggat cgattccgga gcctgaatgg cgaatggacg 2220 
cgccctgtag cggcgcatta agcgcggcgg gtgtggtggt tacgcgcacg tgaccgctac 2280 
acttgccagc gccctagcgc ccgctccttt cgctttcttc ccttcctttc tcgccacgtt 2340 
cgccggcttt ccccgtcaag ctctaaatcg ggggctccct ttagggttcc gatttagtgc 2400 
tttacggcac ctcgacccca aaaaacttga ttagggtgat ggttcacgta gtgggccatc 2460 
gccctgatag acggtttttc gccctttgac gttggagtcc acgttcttta atagtggact 2520 
cttgttccaa actggaacaa cactcaaccc tatctcggtc tattcttttg atttataagg 2580 
gattttgccg atttcggcct attggttaaa aaatgagctg atttaacaaa aatttaacgc 2640 
gaattttaac aaaatattaa cgcttacaat ttcgcctgtg taccttctga ggcggaaaga 27 00 
accagctgtg gaatgtgtgt cagttagggt gtggaaagtc cccaggctcc ccagcaggca 2760 
gaagtatgca aagcatgcat ctcaattagt cagcaaccag gtgtggaaag tccccaggct 2820 
ccccagcagg cagaagtatg caaagcatgc atctcaatta gtcagcaacc atagtcccgc 2880 
ccctaactcc gcccatcccg cccctaactc cgcccagttc cgcccattct ccgccccatg 2 940 
gctgactaat tttttttatt tatgcagagg ccgaggccgc ctcggcctct gagctattcc 30 00 
agaagtagtg aggaggcttt tttggaggcc taggcttttg caaaaagctt gattcttctg 3060 
acacaacagt ctcgaactta aggctagagc caccatgatt gaacaagatg gattgcacgc 3120 
aggttctccg gccgcttggg tggagaggcfc attcggctat gactgggcac aacagacaat 318 0 
cggctgctct gatgccgccg tgttccggct gtcagcgcag gggcgcccgg ttctttttgt 3240 
caagaccgac ctgtccggtg ccctgaatga actgcaggac gaggcagcgc ggctatcgtg 3300 
gctggccacg acgggcgttc cttgcgcagc tgtgctcgac gttgtcactg aagcgggaag 336 0 
ggactggctg ctattgggcg aagtgccggg gcaggatctc ctgtcatctc accttgctcc 3420 
tgccgagaaa gtatccatca tggctgatgc aatgcggcgg ctgcatacgc ttgatccggc 348 0 
tacctgccca ttcgaccacc aagcgaaaca tcgcatcgag cgagcacgta ctcggatgga 3 54 0 
agccggtctt gtcgatcagg atgatctgga cgaagagcat caggggctcg cgccagccga 3600 
actgttcgcc aggctcaagg cgcgcatgcc cgacggcgag gatctcgtcg tgacccatgg 3660 
cgatgcctgc ttgccgaata tcatggtgga aaatggccgc ttttctggat tcatcgactg 3720 
tggccggctg ggtgtggcgg accgctatca ggacatagcg ttggctaccc gtgatattgc 3780 
tgaagagctt ggcggcgaat gggctgaccg cttcctcgtg ctttacggta tcgccgctcc 3840 
cgattcgcag cgcatcgcct tctatcgcct tcttgacgag ttcttctgag cgggactctg 3 900 



gggttcgaaa tgaccgacca agcgacgccc 
atctttattt tcattacatc tgtgtgttgg 
ctctcagtac aatctgctct gatgccgcat 
ccgctgacgc gccctgacgg gcttgtctgc 
ccgtctccgg gagctgcatg tgtcagaggt 
gaaagggcct cgtgatacgc ctatttttat 
agacgtcagg tggcactttt cggggaaatg 
aaatacattc aaatatgtat ccgctcatga 
attgaaaaag gaagagtatg agtattcaac 
cggcattttg ccttcctgtt tttgctcacc 
aagatcagtt gggtgcacga gtgggttaca 
ttgagagttt tcgccccgaa gaacgttttc 
gtggcgcggt attatcccgt attgacgccg 
attctcagaa tgacttggtt gagtactcac 
tgacagtaag agaattatgc agtgctgcca 
tacttctgac aacgatcgga ggaccgaagg 
atcatgtaac tcgccttgat cgttgggaac 
agcgtgacac cacgatgcct gtagcaatgg 
aactacttac tctagcttcc cggcaacaat 
caggaccact tcfcgcgctcg gcccttccgg 
ccggtgagcg tgggtctcgc ggtatcattg 
gtatcgtagt tatctacacg acggggagtc 
tcgctgagat aggtgcctca ctgattaagc 
atatacttta gattgattta aaacttcatt 
tttttgataa tctcatgacc aaaatccctt 
accccgtaga aaagatcaaa ggatcttctt 
gcttgcaaac aaaaaaacca ccgctaccag 
caactctttt tccgaaggta actggcttca 
tagtgtagcc gtagttaggc caccacttca 
ctctgctaac cctgttacca gtggctgctg 
tggactcaag acgatagtta ccggataagg 
gcacacagcc cagcttggag cgaacgacct 
tatgagaaag cgccacgctt cccgaaggga 
gggtcggaac aggagagcgc acgagggagc 
gtcctgtcgg gtttcgccac ctctgacttg 
ggcggagcct atggaaaaac gccagcaacg 
ggccttttgc tcacatggct cgac 
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aacctgccat cacgatggcc gcaataaaat 3960 
ttttttgtgt gaagatccgc gtatggtgca 4020 
agttaagcca gccccgacac ccgccaacac 4080 
tcccggcatc cgcttacaga caagctgtga 4140 
tttcaccgtc atcaccgaaa cgcgcgagac 4200 
aggttaatgt catgataata atggtttctt 4260 
tgcgcggaac ccctatttgt ttatttttct 4320 
gacaataacc ctgataaatg cttcaataat 4380 
atttccgtgt cgcccttatt cccttttttg 4440 
cagaaacgct ggtgaaagta aaagatgctg 4500 
tcgaactgga tctcaacagc ggtaagatcc 4560 
caatgatgag cacttttaaa gttctgctat. 462 0 
ggcaagagca actcggtcgc cgcatacact 468 0 
cagtcacaga aaagcatctt acggatggca 474 0 
taaccatgag tgataacact gcggccaact 4800 
agctaaccgc ttttttgcac aacatggggg 4860 
cggagctgaa tgaagccata ccaaacgacg 4 920 
caacaacgtt gcgcaaacta ttaactggcg 4 980 
taatagactg gatggaggcg gataaagttg 5040 
ctggctggtt tattgctgat aaatctggag 5100 
cagcactggg gccagatggt aagccctccc 5160 
aggcaactat ggatgaacga aatagacaga 5220 
attggtaact gtcagaccaa gtttactcat 528 0 
tttaatttaa aaggatctag gtgaagatcc 5340 
aacgtgagtt ttcgttccac tgagcgtcag 5400 
gagatccttt ttttctgcgc gtaatctgct 5460 
cggtggtttg tttgccggat caagagctac 5520 
gcagagcgca gataccaaat actgtccttc 558 0 
agaactctgt agcaccgcct acatacctcg 5640 
ccagtggcga taagtcgtgt cttaccgggt 57 00 
cgcagcggtc gggctgaacg gggggttcgt 576 0 
acaccgaact gagataccta cagcgtgagc 582 0 
gaaaggcgga caggtatccg gtaagcggca 588 0 
ttccaggggg aaacgcctgg tatctttata 5940 
agcgtcgatt tttgtgatgc tcgtcagggg 6000 
cggccttttt acggttcctg gccttttgct 6060 
6084 
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<210> 8 

<211> 6085 

<212> DNA 

<213> Homo sapiens 



<400> 8 

agatcttcaa tattggccat tagccatatt attcattggt tatatagcat aaatcaatat 60 
tggctattgg ccattgcata cgttgtatct atatcataat atgtacattt atattggctc 120 
atgtccaata tgaccgccat gttggcattg attattgact agttattaat agtaatcaat 180 
tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac ttacggtaaa 240 
tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt 3 00 
tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta 360 
aactgcccac ttggcagtac atcaagtgta tcatatgcca agtccgcccc ctattgacgt 420 
caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttac gggactttcc 48 0 
tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca 54 0 
gtacaccaat gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat 600 
tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa aatgtcgtaa 66 0 
caactgcgat cgcccgcccc gttgacgcaa atgggcggta ggcgtgtacg gtgggaggtc 720 
tatataagca gagctcgttt agtgaaccgt cagatcacta gaagctttat tgcggtagtt 78 0 
tatcacagtt aaattgctaa cgcagtcagt gcttctgaca caacagtctc gaacttaagc 840 
tgcagtgact ctcttaatta actccaccag tctcacttca gttccttttg cctccaccag 900 
tctcacttca gttccttttg catgaagagc tcagaatcaa aagaggaaac caacccctaa 960 
gatgagcttt ccatgtaaat ttgtagccag cttccttctg attttcaatg tttcttccaa 1020 
aggtgcagtc tccaaagaga ttacgaatgc cttggaaacc tggggtgcct tgggtcagga 1080 
catcaacttg gacattccta gttttcaaat gagtgatgat attgacgata taaaatggga 1140 
aaaaacttca gacaagaaaa agattgcaca attcagaaaa gagaaagaga ctttcaagga 1200 
aaaagataca tataagctat ttaaaaatgg aactctgaaa attaagcatc tgaagaccga 12 60 
tgatcaggat atctacaagg tatcaatata tgatacaaaa ggaaaaaatg tgttggaaaa 1320 
aatatttgat ttgaagattc aagagagggt tfcaaaacca aagatctcct ggacttgtat 13 80 
caacacaacc ctgacctgtg aggtaatgaa tggaactgac cccgaattaa acctgtatca 1440 
agatgggaaa catctaaaac tttctcagag ggtcatcaca cacaagtgga ccaccagcct 1500 
gagtgcaaaa ttcaagtgca cagcagggaa caaagtcagc aaggaatcca gtgtcgagcc 1560 
j;gtcagctgt ccagagaaag ggatcccagg tgagtagggc ccgatccttc tagagtcgag 1620 
ctctcttaag gtagcaaggt tacaagacag gtttaaggag accaatagaa actgggcttg 16 8 0 
tcgagacaga gaagactctt gcgtttctga taggcaccta ttggtcttac gcggccgcga 1740 
attccaagct tgagtattct atcgtgtcac ctaaataact tggcgtaatc atggtcatat 18 00 



ctgtttcctg tgtgaaattg ttatccgctc 
ataaagtgta aagcctgggg tgcctaatga 
gatgcttcca ttttgtgagg gttaatgctt 
tttggacaaa ccacaacaag aatgcagtga 
gctattgctt tatttgtaac cattataagc 
attcatttta tgtttcaggt tcagggggag 
ctctacaaat gtggtaaaat ccgataagga 
gcgccctgta gcggcgcatt aagcgcggcg 
cacttgccag cgccctagcg cccgctcctt 
tcgccggctt tccccgtcaa gctctaaatc 
ctttacggca cctcgacccc aaaaaacttg 
cgccctgata gacggttttt cgccctttga 
tcttgttcca aactggaaca acactcaacc 
ggattttgcc gatttcggcc tattggttaa 
cgaattttaa caaaatatta acgcttacaa 
aaccagctgt ggaatgtgtg tcagttaggg 
agaagtatgc aaagcatgca tctcaattag 
tccccagcag gcagaagtat gcaaagcatg 
cccctaactc cgcccatccc gcccctaact 
ggctgactaa ttttttttat ttatgcagag 
cagaagtagt gaggaggctt ttttggaggc 
gacacaacag tctcgaactt aaggctagag 
caggttctcc ggccgcttgg gtggagaggc 
tcggctgctc tgatgccgcc gtgttccggc 
tcaagaccga cctgtccggt gccctgaatg 
ggctggccac gacgggcgtt ccttgcgcag 
gggactggct gctattgggc gaagtgccgg 
ctgccgagaa agtatccatc atggctgatg 
ctacctgccc attcgaccac caagcgaaac 
aagccggtct tgtcgatcag gatgatctgg 
aactgttcgc caggctcaag gcgcgcatgc 
gcgatgcctg cttgccgaat atcatggtgg 
gtggccggct gggtgtggcg gaccgctatc 
ctgaagagct tggcggcgaa tgggctgacc 
ccgattcgca gcgcatcgcc ttctatcgcc 
ggggttcgaa atgaccgacc aagcgacgcc 
tatctttatt ttcattacat ctgtgtgttg 
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acaattccac acaacatacg agccggaagc 1860 
gtgagctaac tcacattaat tgcgttgcgc 1920 
cgagaagaca tgataagata cattgatgag 1980 
aaaaaatgct ttatttgtga aatttgtgat 2040 
tgcaataaac aagttaacaa caacaattgc 2100 
atgtgggagg ttttttaaag caagtaaaac 2160 
tcgattccgg agcctgaatg gcgaatggac 2220 
ggtgtggtgg ttacgcgcac gtgaccgcta 2280 
tcgctttctt cccttccttt ctcgccacgt 2340 
gggggctccc tttagggttc cgatttagtg 2400 
attagggtga tggttcacgt agtgggccat 2460 
cgttggagtc cacgttcttt aatagtggac 2520 
ctatctcggt ctattctttt gatttataag 2580 
aaaatgagct gatttaacaa aaatttaacg 264 0 
tttcgcctgt gtaccttctg aggcggaaag 2700 
tgtggaaagt ccccaggctc cccagcaggc 2760 
tcagcaacca ggtgtggaaa gtccccaggc 2820 
catctcaatt agtcagcaac catagtcccg 28 8 0 
ccgcccagtt ccgcccattc tccgccccat 2940 
gccgaggccg cctcggcctc tgagctattc 3000 
ctaggctttt gcaaaaagct tgattcttct 3060 
ccaccatgat tgaacaagat ggattgcacg 3120 
tattcggcta tgactgggca caacagacaa 3180 
tgtcagcgca ggggcgcccg gttctttttg '3240 
aactgcagga cgaggcagcg cggctatcgt 3300 
ctgtgctcga cgttgtcact gaagcgggaa 3360 
ggcaggatct cctgtcatct caccttgctc 3420 
caatgcggcg gctgcatacg cttgatccgg 3480 
atcgcatcga gcgagcacgt actcggatgg 3540 
acgaagagca tcaggggctc gcgccagccg 3600 
ccgacggcga ggatctcgtc gtgacccatg 3660 
aaaatggccg cttttctgga ttcatcgact 3 720 
aggacatagc gttggctacc cgtgatattg 3780 
gcttcctcgt gctttacggt atcgccgctc 3840 
ttcttgacga gttcttctga gcgggactct 3 9 00 
caacctgcca tcacgatggc cgcaataaaa 39 60 
gttttttgtg tgaagatccg cgtatggtgc 4 020 



actctcagta caatctgctc tgatgccgca 
cccgctgacg cgccctgacg ggcttgtctg 
accgtctccg ggagctgcat gtgtcagagg 
cgaaagggcc tcgtgatacg cctattttta 
tagacgtcag gtggcacttt tcggggaaat 
taaatacatt caaatatgta tccgctcatg 
tattgaaaaa ggaagagtat gagtattcaa 
gcggcatttt gccttcctgt ttttgctcac 
gaagatcagt tgggtgcacg agtgggttac 
cttgagagtt ttcgccccga agaacgtttt 
tgtggcgcgg tattatcccg tattgacgcc 
tattctcaga atgacttggt tgagtactca 
atgacagtaa gagaattatg cagtgctgcc 
ttacttctga caacgatcgg aggaccgaag 
gatcatgtaa ctcgccttga tcgttgggaa 
gagcgtgaca ccacgatgcc tgtagcaatg 
gaactactta ctctagcttc ccggcaacaa 
gcaggaccac ttctgcgctc ggcccttccg 
gccggtgagc gtgggtctcg cggtatcatt 
cgtatcgtag ttatctacac gacggggagt 
atcgctgaga taggtgcctc actgattaag 
tatatacttt agattgattt aaaacttcat 
ctttttgata atctcatgac caaaatccct 
gaccccgtag aaaagatcaa aggatcttct 
tgcttgcaaa caaaaaaacc accgctacca 
ccaactcttt ttccgaaggt aactggcttc 
ctagtgtagc cgtagttagg ccaccacttc 
gctctgctaa tcctgttacc agtggctgct 
ttggactcaa gacgatagtt accggataag 
tgcacacagc ccagcttgga gcgaacgacc 
ctatgagaaa gcgccacgct tcccgaaggg 
agggtcggaa caggagagcg cacgagggag 
agtcctgtcg ggtttcgcca cctctgactt 
gggcggagcc tatggaaaaa cgccagcaac 
tggcctxttg ctcacatggc tcgac 
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tagttaagcc agccccgaca cccgccaaca 4080 
ctcccggcat ccgcttacag acaagctgtg 4140 
ttttcaccgt catcaccgaa acgcgcgaga 4200 
taggttaatg tcatgataat aatggtttct 4260 
gtgcgcggaa cccctatttg tttatttttc 4320 
agacaataac cctgataaat gcttcaataa 4380 
catttccgtg tcgcccttat tccctttttt 444 0 
ccagaaacgc tggtgaaagt aaaagatgct 4 500 
atcgaactgg atctcaacag cggtaagatc 4560 
ccaatgatga gcacttttaa agttctgcta 4620 
gggcaagagc aactcggtcg ccgcatacac 468 0 
ccagtcacag . aaaagcatct tacggatggc 474 0 
ataaccatga gtgataacac tgcggccaac 4800 
gagctaaccg" cttttttgca caacatgggg 486 0 
ccggagctga atgaagccat accaaacgac 4 920 
gcaacaacgt tgcgcaaact attaactggc 4 980 
ttaatagact ggatggaggc ggataaagtt 5040 
gctggctggt ttattgctga taaatctgga 5100 
gcagcactgg ggccagatgg taagccctcc 5160 
caggcaacta tggatgaacg aaatagacag 5220 
cattggtaac tgtcagacca agtttactca 5280 
ttttaattta aaaggatcta ggtgaagatc 534 0 
taacgtgagt tttcgttcca ctgagcgtca 5400 
tgagatcctt tttttctgcg cgtaatctgc 5460 
gcggtggttt gtttgccgga tcaagagcta 5520 
agcagagcgc agataccaaa tactgtcctt 5580 
aagaactctg tagcaccgcc tacatacctc 5640 
gccagtggcg ataagtcgtg tcttaccggg 5700 
gcgcagcggt cgggctgaac ggggggttcg 576 0 
tacaccgaac tgagatacct acagcgtgag 5820 
agaaaggcgg acaggtatcc ggtaagcggc 588 0 
cttccagggg gaaacgcctg gtatctttat 594 0 
gagcgtcgat ttttgtgatg ctcgtcaggg 6000 
gcggcctttt tacggttcct ggccttttgc 6060 
6085 



:210> 9 



-9- 



<211> 6086 
<212> DNA 
<213> Homo sapiens 

<400> 9 

agatcttcaa tattggccat tagccatatt attcattggt tatatagcat aaatcaatat 60 
tggctattgg ccattgcata cgttgtatct atatcataat atgtacattt atattggctc 120 
atgtccaata tgaccgccat gttggcattg attattgact agttattaat agtaatcaat 18 0 
tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac ttacggtaaa 24 0 
tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt 300 
tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta 360 
aactgcccac ttggcagtac atcaagtgta tcatatgcca agtccgcccc ctatfcgacgt 42 0 
caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttac gggactttcc 48 0 
tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca 54 0 
gtacaccaat gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat 600 
tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa aatgtcgtaa 660 
caactgcgat cgcccgcccc gttgacgcaa atgggcggta ggcgtgtacg gtgggaggtc 72 0 
tatataagca gagctcgttt agtgaaccgt cagatcacta gaagctttat tgcggtagtt 78 0 
tatcacagtt aaattgctaa cgcagtcagt gcttctgaca caacagtctc gaacttaagc 84 0 
tgcagtgact ctcttaatta actccaccag tctcacttca gttccttttg cctccaccag 900 
tctcacttca gttccttttg catgaagagc tcagaatcaa aagaggaaac caacccctaa 960 
gatgagcttt ccatgtaaat ttgtagccag cttccttctg attttcaatg tttcttccaa 102 0 
aggtgcagtc tccaaagaga ttacgaatgc cttggaaacc tggggtgcct tgggtcagga 108 0 
catcaacttg gacattccta gttttcaaat gagtgatgat attgacgata taaaatggga 114 0 
aaaaacttca gacaagaaaa agattgcaca attcagaaaa gagaaagaga ctttcaagga 1200 
aaaagataca tataagctat ttaaaaatgg aactctgaaa attaagcatc tgaagaccga 1260 
tgatcaggat atctacaagg tatcaatata tgatacaaaa ggaaaaaatg tgttggaaaa 1320 
aatatttgat ttgaagattc aagagagggt ctcaaaacca aagatctcct ggacttgtat 138 0 
caacacaacc ctgacctgtg aggtaatgaa tggaactgac cccgaattaa acctgtatca 1440 
agatgggaaa catctaaaac tttctcagag ggtcatcaca cacaagtgga ccaccagcct 1500 
gagtgcaaaa ttcaagtgca cagcagggaa caaagtcagc aaggaatcca gtgtcgagcc 1560 
tgtcagctgt ccagagaaag ggatccacag gtgagtaggg cccgatcctt ctagagtcga 1620 
gctctcttaa ggtagcaagg ttacaagaca ggtttaagga gaccaataga aactgggctt 1680 
gtcgagacag agaagactct tgcgtttctg ataggcacct attggtctta cgcggccgcg 1740 
aattccaagc ttgagtattc tatcgtgtca cctaaataac ttggcgtaat catggtcata 18 00 
tctgtttcct gtgtgaaatt gttatccgct cacaattcca cacaacatac gagccggaag 1860 
cataaagtgc aaagcctggg gtgcctaatg agtgagctaa ctcacattaa ttgcgttgcg 1920 
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cgatgcttcc attttgtgag ggttaatgct tcgagaagac atgataagat acattgatga 1980 
gtttggacaa accacaacaa gaatgcagtg aaaaaaatgc tttatttgtg aaatttgtga 2040 
tgctattgct ttatttgtaa ccattataag ctgcaataaa caagttaaca acaacaattg 2100 
cattcatttt atgtttcagg ttcaggggga gatgtgggag gttttttaaa gcaagtaaaa 2160 
cctctacaaa tgtggtaaaa tccgataagg atcgattccg gagcctgaat ggcgaatgga 2220 
cgcgccctgt agcggcgcat taagcgcggc gggtgtggtg gttacgcgca cgtgaccgct 2280 
acacttgcca gcgccctagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 2340 
ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt ccgatttagt 2400 
gctttacggc acctcgaccc caaaaaactt gattagggtg atggttcacg tagtgggcca 2460 
tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 2520 
ctcttgttcc aaactggaac aacactcaac cctatctcgg tctattcttt tgatttataa 2580 
gggattttgc cgatttcggc ctattggtta aaaaatgagc tgatttaaca aaaatttaac 2640 
gcgaatttta acaaaatatt aacgcttaca atttcgcctg tgtaccttct gaggcggaaa 2700 
gaaccagctg tggaatgtgt gtcagttagg gtgtggaaag tccccaggct ccccagcagg 276 0 
cagaagtatg caaagcatgc atctcaatta gtcagcaacc aggtgtggaa agtccccagg 282 0 
ctccccagca ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa ccatagtccc 288 0 
gcccctaact ccgcccatcc cgcccctaac tccgcccagt tccgcccatt ctccgcccca 294 0 
tggctgacta atttttttta tttatgcaga ggccgaggcc gcctcggcct ctgagctatt 3 000 
ccagaagtag tgaggaggct tttttggagg cctaggcttt tgcaaaaagc ttgattcttc 306 0 
tgacacaaca gtctcgaact taaggctaga gccaccatga ttgaacaaga tggattgcac 312 0 
gcaggttctc cggccgcttg ggtggagagg ctattcggct atgactgggc acaacagaca 318 0 
atcggctgct ctgatgccgc cgtgttccgg ctgtcagcgc aggggcgccc ggttcttttt 324 0 
gtcaagaccg acctgtccgg tgccctgaat gaactgcagg acgaggcagc gcggctatcg 3300 
tggctggcca cgacgggcgt tccttgcgca gctgtgctcg acgttgtcac tgaagcggga 3360 
agggactggc tgctattggg cgaagtgccg gggcaggatc tcctgtcatc tcaccttgct 3420 
cctgccgaga aagtatccat catggctgat gcaatgcggc ggctgcatac gcttgatccg 3480 
gcfcacctgcc cattcgacca ccaagcgaaa catcgcatcg agcgagcacg tactcggatg 3540 
gaagccggtc ttgtcgatca ggatgatctg gacgaagagc atcaggggct cgcgccagcc 3 600 
gaactgttcg ccaggctcaa ggcgcgcatg cccgacggcg aggatctcgt cgtgacccat 3660 
ggcgatgcct gcttgccgaa tatcatggtg gaaaatggcc gcttttctgg attcatcgac 3720 
tgtggccggc tgggtgtggc ggaccgctat caggacatag cgttggctac ccgtgatatt 378 0 
gctgaagagc ttggcggcga atgggctgac cgcttcctcg tgctttacgg tatcgccgct 384 0 
cccgattcgc agcgcatcgc cttctatcgc cttcttgacg agttcttctg agcgggactc 3900 
tggggttcga aatgaccgac caagcgacgc ccaacctgcc atcacgatgg ccgcaataaa 3 96 0 
atatctttat tttcattaca tctgtgtgtt ggttttttgt gtgaagatcc gcgtatggtg 4020 
cactctcagt acaatctgct ctgatgccgc atagttaagc cagccccgac acccgccaac 4 08 0 
acccgctgac gcgccctgac gggcttgtct gctcccggca tccgcttaca gacaagctgt 414 0 



gaccgtctcc gggagctgca tgtgtcagag 
acgaaagggc ctcgtgatac gcctattttt 
ttagacgtca ggtggcactt ttcggggaaa 
ctaaatacat tcaaatatgt atccgctcat 
atattgaaaa aggaagagta tgagtattca 
tgcggcattt tgccttcctg tttttgctca 
tgaagatcag ttgggtgcac gagtgggtta 
ccttgagagt tttcgccccg aagaacgttt 
atgtggcgcg gtattatccc gtattgacgc 
ctattctcag aatgacttgg ttgagtactc 
catgacagta agagaattat gcagtgctgc 
cttacttctg acaacgatcg gaggaccgaa 
ggatcatgta actcgccttg atcgttggga 
cgagcgtgac accacgatgc ctgtagcaat 
cgaactactt actctagctt cccggcaaca 
tgcaggacca cttctgcgct cggcccttcc 
agccggtgag cgtgggtctc gcggtatcat 
ccgtatcgta gttatctaca cgacggggag 
gatcgctgag ataggtgcct cactgattaa 
atatatactt tagattgatt taaaacttca 
cctttttgat aatctcatga ccaaaatccc 
agaccccgta gaaaagatca aaggatcttc 
ctgcttgcaa acaaaaaaac caccgctacc 
accaactctt tttccgaagg taactggctt 
tctagtgtag ccgtagttag gccaccactt 
cgctctgcta atcctgttac cagtggctgc 
gttggactca agacgatagt taccggataa 
gtgcacacag cccagcttgg agcgaacgac 
gctatgagaa agcgccacgc ttcccgaagg 
cagggtcgga acaggagagc gcacgaggga 
tagtcctgtc gggtttcgcc acctctgact 
ggggcggagc ctatggaaaa acgccagcaa 
ctggcctttt gctcacatgg ctcgac 
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gttttcaccg tcatcaccga aacgcgcgag 4200 
ataggttaat gtcatgataa taatggtttc 4260 
tgtgcgcgga acccctattt gtttattttt 4320 
gagacaataa ccctgataaa tgcttcaata 4380 
acatttccgt gtcgccctta ttcccttttt 4440 
cccagaaacg ctggtgaaag taaaagatgc 4500 
catcgaactg gatctcaaca gcggtaagat 4560 
tccaatgatg agcactttta aagttctgct 4620 
cgggcaagag caactcggtc gccgcataca 4680 
accagtcaca gaaaagcatc ttacggatgg 4740 
cataaccatg agtgataaca ctgcggccaa 4800 
ggagctaacc gcttttttgc acaacatggg 4860 
accggagctg aatgaagcca taccaaacga 4 920 
ggcaacaacg ttgcgcaaac tattaactgg 4 980 
attaatagac tggatggagg cggataaagt 5 040 
ggctggctgg tttattgctg ataaatctgg 5100 
tgcagcactg gggccagatg gtaagccctc 5160 
tcaggcaact atggatgaac gaaatagaca 5220 
gcattggtaa ctgtcagacc aagtttactc 5280 
tttttaattt aaaaggatct aggtgaagat 5340 
ttaacgtgag ttttcgttcc actgagcgtc 5400 
ttgagatcct ttttttctgc gcgtaatctg 5460 
agcggtggtt tgtttgccgg atcaagagct 5520 
cagcagagcg cagataceaa atactgtcct 5580 
caagaactct gtagcaccgc ctacatacct 5640 
tgccagtggc gataagtcgt gtcttaccgg 5700 
ggcgcagcgg tcgggctgaa cggggggttc 5760 
ctacaccgaa ctgagatacc ta.cagcgtga 5820 
gagaaaggcg gacaggtatc cggtaagcgg 5880 
gcttccaggg ggaaacgcct ggtatcttta 5940 
tgagcgtcga tttttgtgat gctcgtcagg 6000 
cgcggccttt ttacggttcc tggccttttg 6060 
6086 



<210> 10 
<211> 38 
<212> DNA 
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<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 10 

tttttttttt ttcgtcagcg gccgcatcnn nntttatt 38 

<210> 11 
<211> 25 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 11 

cagatcacta gaagctttat tgcgg 25 

<210> 12 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 12 

ttttcgtcag cggccgcatc 2C 

<210> 13 
<211> 45 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
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<400> 13 

actcataggc catagaggcc tatcacagtt aaattgctaa cgcag 45 

<210> 14 
<211> 43 
<212> DNA 

<213> Artificial sequence 
<221> OTHER 
<222> 1 

<223> 5" cytosine at position #1 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 
. <400> 14 

ctcgtttagt gcggccgctc agatcactga attctgacga cct 43 

<210> 15 
<211> 41 
<212> DNA 

<213> Artificial sequence 

<221> OTHER 
<222> 1 

<223> 5' cytosine at position #1 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 15 

ctcgtttagt ggcgcgccag atcactgaat tctgacgacc t 41 

<210> 16 
<211> 22 
<212> DNA 

<213> Artificial sequence 
<221> OTHER 

<223> Description of artificial sequence: synthetic oligonucleotide 



<400> 16 



gacctactga ttaacggcca ta 
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<210> 17 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<221> OTHER 
<222> 1 

<223> 3' thymidine at position #20 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 



<400> 17 

tcgtcagaat tcagtgatct 
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